You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2010/11/24 20:45:14 UTC

[jira] Created: (HBASE-3276) delete followed by a put with the same timestamp

delete followed by a put with the same timestamp
------------------------------------------------

                 Key: HBASE-3276
                 URL: https://issues.apache.org/jira/browse/HBASE-3276
             Project: HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan


[Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]

The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.

However, if there is a delete for a specific timestamp, then the later put doesn't win. 

Say for example the following is the sequence of operations:

put                         row/col/v1 - value1
deleteColumn     row/col/v1
put                         row/col/v1 - value2

Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.

However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 

---

Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.








-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936128#action_12936128 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Ryan Rawson" <ry...@gmail.com>


bq.  On 2010-11-26 14:54:45, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
bq.  > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
bq.  >
bq.  >     what are all the consequences for not sorting by type when using KVComparator?  Does this mean we might create HFiles that not sorted properly, because the HFile comparator uses the KeyComparator directly with ignoreType = false. 
bq.  >     
bq.  >     While in memstore we can rely on memstoreTS to roughly order by insertion time, and the Put/Delete should probably work in that situation, you are talking about modifiying a pretty core and important concept in how we sort things.
bq.  >     
bq.  >     There are other ways to reconcile bugs like this, one of them is to extend the memstoreTS concept into the HFile and use that to reconcile during reads.  There is another JIRA where I proposed this.  
bq.  >     
bq.  >     If we are talking about 0.92 and beyond I'd prefer building a solid base rather than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they might pass, that doesnt guarantee lack of bad behaviour later on.
bq.  >
bq.  
bq.  Pranav Khaitan wrote:
bq.      Agree. As I mentioned, this is a major change and more thought needs to be given to it.
bq.      
bq.      However, to resolve issues like HBASE-3276, we need either such a change or extend the memstoreTS concept to HFile as you mentioned.
bq.      
bq.      About consequences, I don't see anything negative here. This change only affects the sorting of keys having same row, col, timestamp. After this change, all keys with the same row, col, ts will be sorted purely based on the order in which they were inserted. When a memstore is flushed to HFile, the memstoreTS takes care of ordering. During compactions, the KeyValueHeap breaks ties by using the sequence ids of storefiles.

the problem is you are now changing how things are ordered sometimes but not all the time.  HFile directly uses the rawcomparator, instantiating it directly rather than getting it via the code path you changed.  So now you create a memstore in this order:

row,col,100,Put  (memstoreTS=1)
row,col,100,Delete (memstoreTS=2)
row,col,100,Put (memstoreTS=3)

But the HFile comparator will consider this out of order since it doesnt know about memstoreTS and it still expects things to be in a certain order.

I'm a little wary of having implicit ordering in the HFiles... in your new scheme, Put,Delete,Put are in that order 'just because they are', and the comparator cannot put them back in order, and must rely on scanner order.  During compactions we would place keys in order based on which files they came from, but they wouldn't themselves have an order.  Basically we should get rid of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in the HFiles.  


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "Pranav Khaitan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935917#action_12935917 ] 

Pranav Khaitan commented on HBASE-3276:
---------------------------------------

Kannan, this is easy. I can take care of this.

> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936131#action_12936131 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Pranav Khaitan" <pr...@gmail.com>


bq.  On 2010-11-26 14:54:45, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
bq.  > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
bq.  >
bq.  >     what are all the consequences for not sorting by type when using KVComparator?  Does this mean we might create HFiles that not sorted properly, because the HFile comparator uses the KeyComparator directly with ignoreType = false. 
bq.  >     
bq.  >     While in memstore we can rely on memstoreTS to roughly order by insertion time, and the Put/Delete should probably work in that situation, you are talking about modifiying a pretty core and important concept in how we sort things.
bq.  >     
bq.  >     There are other ways to reconcile bugs like this, one of them is to extend the memstoreTS concept into the HFile and use that to reconcile during reads.  There is another JIRA where I proposed this.  
bq.  >     
bq.  >     If we are talking about 0.92 and beyond I'd prefer building a solid base rather than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they might pass, that doesnt guarantee lack of bad behaviour later on.
bq.  >
bq.  
bq.  Pranav Khaitan wrote:
bq.      Agree. As I mentioned, this is a major change and more thought needs to be given to it.
bq.      
bq.      However, to resolve issues like HBASE-3276, we need either such a change or extend the memstoreTS concept to HFile as you mentioned.
bq.      
bq.      About consequences, I don't see anything negative here. This change only affects the sorting of keys having same row, col, timestamp. After this change, all keys with the same row, col, ts will be sorted purely based on the order in which they were inserted. When a memstore is flushed to HFile, the memstoreTS takes care of ordering. During compactions, the KeyValueHeap breaks ties by using the sequence ids of storefiles.
bq.  
bq.  Ryan Rawson wrote:
bq.      the problem is you are now changing how things are ordered sometimes but not all the time.  HFile directly uses the rawcomparator, instantiating it directly rather than getting it via the code path you changed.  So now you create a memstore in this order:
bq.      
bq.      row,col,100,Put  (memstoreTS=1)
bq.      row,col,100,Delete (memstoreTS=2)
bq.      row,col,100,Put (memstoreTS=3)
bq.      
bq.      But the HFile comparator will consider this out of order since it doesnt know about memstoreTS and it still expects things to be in a certain order.
bq.      
bq.      I'm a little wary of having implicit ordering in the HFiles... in your new scheme, Put,Delete,Put are in that order 'just because they are', and the comparator cannot put them back in order, and must rely on scanner order.  During compactions we would place keys in order based on which files they came from, but they wouldn't themselves have an order.  Basically we should get rid of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in the HFiles.  
bq.      
bq. 

Right. I see that HFile does an extra check and throws an IOException when it gets data out of the order. So if we go forward with this change, we will have to ensure that the comparator used by HFile knows about this change. This change be achieved in two ways: Firstly, by setting the default value of ignoreType = true. Alternately, the HFile can explicitly set ignoreType = true.


- Pranav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936123#action_12936123 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Pranav Khaitan" <pr...@gmail.com>


bq.  On 2010-11-26 14:54:45, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
bq.  > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
bq.  >
bq.  >     what are all the consequences for not sorting by type when using KVComparator?  Does this mean we might create HFiles that not sorted properly, because the HFile comparator uses the KeyComparator directly with ignoreType = false. 
bq.  >     
bq.  >     While in memstore we can rely on memstoreTS to roughly order by insertion time, and the Put/Delete should probably work in that situation, you are talking about modifiying a pretty core and important concept in how we sort things.
bq.  >     
bq.  >     There are other ways to reconcile bugs like this, one of them is to extend the memstoreTS concept into the HFile and use that to reconcile during reads.  There is another JIRA where I proposed this.  
bq.  >     
bq.  >     If we are talking about 0.92 and beyond I'd prefer building a solid base rather than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they might pass, that doesnt guarantee lack of bad behaviour later on.
bq.  >

Agree. As I mentioned, this is a major change and more thought needs to be given to it.

However, to resolve issues like HBASE-3276, we need either such a change or extend the memstoreTS concept to HFile as you mentioned.

About consequences, I don't see anything negative here. This change only affects the sorting of keys having same row, col, timestamp. After this change, all keys with the same row, col, ts will be sorted purely based on the order in which they were inserted. When a memstore is flushed to HFile, the memstoreTS takes care of ordering. During compactions, the KeyValueHeap breaks ties by using the sequence ids of storefiles. 


- Pranav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965911#action_12965911 ] 

ryan rawson commented on HBASE-3276:
------------------------------------

I'm worried that an implicit ordering opens us to problems in the future.  The kind that involve "i lost my data and there is no way to recover it". 

To that end, I propose we implement HBASE-2856, specifically my comment https://issues.apache.org/jira/browse/HBASE-2856?focusedCommentId=12899119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12899119 

which talks about bringing the memstoreTS (ish, an equivalent but not quite identical value) down into the HFile.  It will have many benefits, including fixing this JIRA, and also fixing the ACID stuff that has been waylaid for lack of this change.

> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936040#action_12936040 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Pranav Khaitan" <pr...@gmail.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/
-----------------------------------------------------------

Review request for hbase, Jonathan Gray and Kannan Muthukkaruppan.


Summary
-------

This is a design change suggested in HBASE-3276 so adequate thought should be given before proceeding. 

The main code change is just one line which is to ignore key type while doing KV comparisons. When the key type is ignored, then all the keys for the same timestamp are sorted according the order in which they were interested. It is still ensured that the delete family and delete column will be at the top because they have the default column name and default timestamp.


This addresses bug HBASE-3276.
    http://issues.apache.org/jira/browse/HBASE-3276


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1039233 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1039233 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java 1039233 

Diff: http://review.cloudera.org/r/1252/diff


Testing
-------

Test cases added. Since there is a change in semantics, some previous tests were failing because of this change. Those tests have been modified to test the newer behavior.


Thanks,

Pranav




> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964880#action_12964880 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: stack@duboce.net


bq.  On 2010-11-26 14:54:45, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
bq.  > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
bq.  >
bq.  >     what are all the consequences for not sorting by type when using KVComparator?  Does this mean we might create HFiles that not sorted properly, because the HFile comparator uses the KeyComparator directly with ignoreType = false. 
bq.  >     
bq.  >     While in memstore we can rely on memstoreTS to roughly order by insertion time, and the Put/Delete should probably work in that situation, you are talking about modifiying a pretty core and important concept in how we sort things.
bq.  >     
bq.  >     There are other ways to reconcile bugs like this, one of them is to extend the memstoreTS concept into the HFile and use that to reconcile during reads.  There is another JIRA where I proposed this.  
bq.  >     
bq.  >     If we are talking about 0.92 and beyond I'd prefer building a solid base rather than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they might pass, that doesnt guarantee lack of bad behaviour later on.
bq.  >
bq.  
bq.  Pranav Khaitan wrote:
bq.      Agree. As I mentioned, this is a major change and more thought needs to be given to it.
bq.      
bq.      However, to resolve issues like HBASE-3276, we need either such a change or extend the memstoreTS concept to HFile as you mentioned.
bq.      
bq.      About consequences, I don't see anything negative here. This change only affects the sorting of keys having same row, col, timestamp. After this change, all keys with the same row, col, ts will be sorted purely based on the order in which they were inserted. When a memstore is flushed to HFile, the memstoreTS takes care of ordering. During compactions, the KeyValueHeap breaks ties by using the sequence ids of storefiles.
bq.  
bq.  Ryan Rawson wrote:
bq.      the problem is you are now changing how things are ordered sometimes but not all the time.  HFile directly uses the rawcomparator, instantiating it directly rather than getting it via the code path you changed.  So now you create a memstore in this order:
bq.      
bq.      row,col,100,Put  (memstoreTS=1)
bq.      row,col,100,Delete (memstoreTS=2)
bq.      row,col,100,Put (memstoreTS=3)
bq.      
bq.      But the HFile comparator will consider this out of order since it doesnt know about memstoreTS and it still expects things to be in a certain order.
bq.      
bq.      I'm a little wary of having implicit ordering in the HFiles... in your new scheme, Put,Delete,Put are in that order 'just because they are', and the comparator cannot put them back in order, and must rely on scanner order.  During compactions we would place keys in order based on which files they came from, but they wouldn't themselves have an order.  Basically we should get rid of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in the HFiles.  
bq.      
bq. 
bq.  
bq.  Pranav Khaitan wrote:
bq.      Right. I see that HFile does an extra check and throws an IOException when it gets data out of the order. So if we go forward with this change, we will have to ensure that the comparator used by HFile knows about this change. This change be achieved in two ways: Firstly, by setting the default value of ignoreType = true. Alternately, the HFile can explicitly set ignoreType = true.

@Ryan, you say "Basically we should get rid of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in the HFiles."  You think this a receipe we should adopt going forward?   Giving it cursory thought, it would seem like it should work.   What about migrating data that was sorted using current KV comparator?  Do we need to migrate files made using old sort order?  Should we mark files that have this new ordering type 2 files?


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965900#action_12965900 ] 

Kannan Muthukkaruppan commented on HBASE-3276:
----------------------------------------------

On flushes, if we did what a minor compaction now does (after HBASE-3048), i.e. process TTL/versions/overwrites etc. then a HFile would never contain a value that should be suppressed.

And with regards to multiple HFiles containing conflicting data (i.e. corresponding to same TS), we could pick the "sequenceId" of the HFile to resolve the winner. (HBASE-1485 fix also relies on sequenceId ordering of HFiles to resolve winners between entries coming from multiple files).


> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935604#action_12935604 ] 

Kannan Muthukkaruppan commented on HBASE-3276:
----------------------------------------------

I wrote: <<< [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]>>>

On second thoughts, this should also help cases where a delete followed by a put arrives within the same millisec.





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan reassigned HBASE-3276:
--------------------------------------------

    Assignee: Kannan Muthukkaruppan

> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936119#action_12936119 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Ryan Rawson" <ry...@gmail.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------



trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
<http://review.cloudera.org/r/1252/#comment6297>

    what are all the consequences for not sorting by type when using KVComparator?  Does this mean we might create HFiles that not sorted properly, because the HFile comparator uses the KeyComparator directly with ignoreType = false. 
    
    While in memstore we can rely on memstoreTS to roughly order by insertion time, and the Put/Delete should probably work in that situation, you are talking about modifiying a pretty core and important concept in how we sort things.
    
    There are other ways to reconcile bugs like this, one of them is to extend the memstoreTS concept into the HFile and use that to reconcile during reads.  There is another JIRA where I proposed this.  
    
    If we are talking about 0.92 and beyond I'd prefer building a solid base rather than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they might pass, that doesnt guarantee lack of bad behaviour later on.
    


- Ryan





> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3276) delete followed by a put with the same timestamp

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936134#action_12936134 ] 

HBase Review Board commented on HBASE-3276:
-------------------------------------------

Message from: "Pranav Khaitan" <pr...@gmail.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/
-----------------------------------------------------------

(Updated 2010-11-26 16:02:47.462170)


Review request for hbase, Jonathan Gray and Kannan Muthukkaruppan.


Summary (updated)
-------

This is a design change suggested in HBASE-3276 so adequate thought should be given before proceeding. 

The main code change is just one line which is to ignore key type while doing KV comparisons. When the key type is ignored, then all the keys for the same timestamp are sorted according the order in which they were inserted. It is still ensured that the delete family and delete column will be at the top because they have the default column name and default timestamp.


This addresses bug HBASE-3276.
    http://issues.apache.org/jira/browse/HBASE-3276


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1039233 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1039233 
  trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java 1039233 

Diff: http://review.cloudera.org/r/1252/diff


Testing
-------

Test cases added. Since there is a change in semantics, some previous tests were failing because of this change. Those tests have been modified to test the newer behavior.


Thanks,

Pranav




> delete followed by a put with the same timestamp
> ------------------------------------------------
>
>                 Key: HBASE-3276
>                 URL: https://issues.apache.org/jira/browse/HBASE-3276
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> [Note: This issue is relevant only for cases that don't use the default "time" based versions, but provide/manage versions explicitly.]
> The fix for HBASE-1485 ensures that if there are multiple puts with the same timestamp the later one wins.
> However, if there is a delete for a specific timestamp, then the later put doesn't win. 
> Say for example the following is the sequence of operations:
> put                         row/col/v1 - value1
> deleteColumn     row/col/v1
> put                         row/col/v1 - value2
> Without the deleteColumn(), HBASE-1485 ensures that "value2" is the winner.
> However, with the deleteColumn() thrown into the mix, the delete wins, and one cannot insert a new value at that version. [The only, unsatisfactory, workaround at this point seems to be trigger a major compaction. The major compact would clear the delete marker, and allow new cells to be created with that version again.] 
> ---
> Seems like it might not be too complicated to extend the fix for HBASE-1485 to also respect ordering between delete/put operations. I'll look into this further.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.