You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Amitanand Aiyer (Created) (JIRA)" <ji...@apache.org> on 2012/01/21 03:00:39 UTC

[jira] [Created] (HBASE-5241) Deletes should not mask Puts that come after it.

Deletes should not mask Puts that come after it.
------------------------------------------------

                 Key: HBASE-5241
                 URL: https://issues.apache.org/jira/browse/HBASE-5241
             Project: HBase
          Issue Type: Improvement
            Reporter: Amitanand Aiyer
            Priority: Minor


Suppose that we have a delete row, and then followed by the put. The delete row
can mask the put, unless there was a major compaction in between.

Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
to differentiate weather or not the Put happened after the Delete and offer better 
delete semantics.

Couldn't find a pre-existing JIRA that already discusses this, so creating one.

Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quiet the same.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221817#comment-13221817 ] 

Phabricator commented on HBASE-5241:
------------------------------------

aaiyer has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:223 mbautin, kannan: Just thinking aloud ...

    If we are able to keep track of the Rows/Rows+Col, at flush time, where we see that a DeleteColumn/DeleteFamily is followed by a Put/KV with a higher memstoreTS; we might be able to skip ahead to getNextRowOrNextColumn as earlier, for almost all cases except ones where there actually was a back-fill.

   Would it be possible, given the hfilev2 structure, to be able to add more kinds of bloom blocks to keep track of this information?


REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209079#comment-13209079 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

@Lars: Was discussing the replication issue a little more with Kannan. It does seem like there may be corner cases in which the exact order does matter. Even if we leave things the way they are; and clients do not take control of the timestamp.

Say for example we have two Puts from the client side. -- If both hit the server in quick succession, they could both be issued the same milli-second timestamp. Which one of them wins, will then be entirely determined by the order in which they are applied. If we apply them in different order, we could end up with different values.

It seems to me that to ensure determinism in terms of what the clients see; it would be crutial to have an internal timestamp that orders every operation (using something like Lamport's logical clock, instead real clock).

I do agree with you that it would be nice if timestamp were considered an internal detail that clients don't take control of. But, we would still have to include memstoreTS or Log Seq Id to ensure deterministic replay.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208773#comment-13208773 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

After some discussion, we figured that skipping to the next column, vs skipping to next KV shouldn't make so much of a difference. So, headed in this direction (as opposed to trying to create indices or change the sort order).


Will need to see once the patch is complete, if the read performance gets affected significantly.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221653#comment-13221653 ] 

Phabricator commented on HBASE-5241:
------------------------------------

aaiyer has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:609 It seems like going for stronger read semantics would disable the space savings we get by storing the memstoreTS as a (1 byte) Zero value, instead of the actual value.

  Perhaps, one way to get similar savings would be to store the actual memstoreTS in the variable length encoding, after differential encoding.

   Here is what I propose.
      Keep track of a per-StoreFile startMemstoreTS value, that (approximately) keeps track of the smallest memstoreTS in the file.

     KV's will store the deltas such that KV's-memstoreTS = StoreFile's-startMemstoreTS + KV-delta.

     If the delta is small enough, we will only use 1 or 2 bytes for storing it. Since we use Bytes.writeVLong: From the java docs:

  if n in [-32, 127): encode in one byte with the actual value. Otherwise,
  if n in [-20*2^8, 20*2^8): encode in two bytes: byte[0] = n/256 - 52; byte[1]=n&0xff. Otherwise,


REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208117#comment-13208117 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

This entire approach seems wrong to me. Things is HBase have a timestamp and one of the nicest parts about HBase is that the actual order in which operations are applied does not matter.

This will break replication where operations can arrive out of order and other code in HBase.

Unless somebody provides a very compelling use case I'm -1 on this general direction.

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5241:
-------------------------------

    Attachment: HBASE-5241.D1731.1.patch

aaiyer requested code review of "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".
Reviewers: JIRA

  https://issues.apache.org/jira/browse/HBASE-5241

  [HBASE-5241] Puts should not be masked by prior Deletes

  Initial version of the patch. Getting it out to start
  discussion/get feedback on a few different ways to address this.

  There are test failures.

  Suppose that we have a delete row, and then followed by the put. The delete row
  can mask the put, unless there was a major compaction in between.

  Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
  to differentiate whether or not the Put happened after the Delete and offer better
  delete semantics.

  Couldn't find a pre-existing JIRA that already discusses this, so creating one.

  Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D1731

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanDeleteTracker.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/DeleteTracker.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3687/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209068#comment-13209068 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

Skipping to next column can be much more efficient then skipping to next KV if there are many versions. In fact stack had filed a bug about it (can't find) and you FB folks put in the fix to skip to the next column.
Sorry to be the party killer here, but let's please not do this. :(

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212214#comment-13212214 ] 

Todd Lipcon commented on HBASE-5241:
------------------------------------

{quote}
Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
to differentiate whether or not the Put happened after the Delete and offer better 
delete semantics.
{quote}

What are the "better semantics" that we would offer? ie, if I do:
- put value "a" at ts=1
- delete at ts=3
- put value "b" at ts=2

and I do a read with "current time" semantics, do you expect to see "b" or nothing? I'm not convinced that "b" is a "better semantic" here, except for the point that it makes major compaction more transparent. The transparency of compaction is sort of nice, but compaction is already not transparent because of time travel reads (except for the "always keep versions" stuff that we did recently)
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211190#comment-13211190 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

What about the fact the currently Deletes and Puts are idempotent? With this change a failed Put or Delete cannot just be redone, because the effect might be different.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208768#comment-13208768 ] 

Phabricator commented on HBASE-5241:
------------------------------------

aaiyer has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:74 fixing this in the next version.

  will update, once I fix the tests as well.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:189 I guess -1 or 0, both would work.

  It seems to be initialized to -1L. But used to get reset to 0 on reset. That didn't make sense.

  Chose one at random.
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java:202 This is one perf penalty we pay for the better consistency semantics.

  We can only zero-out memstoreTS upon major compaction. Not when all readers get past the read point.
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java:845 that was the initial direction.

  I'm working on keeping things backward compatible. so this will get reverted.

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209083#comment-13209083 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

Yeah, I'm worried about the performance slow down as well. 
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208765#comment-13208765 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

I fear we'd have two chronological dimensions now. One as indicated by the timestamps and another indicated by the order in which the changes are physically applied (memstoreTS).

This "problem" is really only a problem when Deletes are dated into the future or Puts are dated in the past. Any app doing this must be aware of the implications.
It just seems like a non-issue to me :)

Replication is just happens to be a place where I can see problems. I'm sure there're more (multi actions, etc).
Is the memstoreTS written to the WAL? (Replication uses WAL shipping).

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208757#comment-13208757 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

@Lars. Sure. I see there can be issues with replication.

Is that something that cannot be fixed?  

I am not really familiar with the replication code path. But, say, if we ship the memstoreTS along with the KV's during replication; would that not take care of the out of order issue?
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5241:
------------------------------

     Description: 
Suppose that we have a delete row, and then followed by the put. The delete row
can mask the put, unless there was a major compaction in between.

Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
to differentiate whether or not the Put happened after the Delete and offer better 
delete semantics.

Couldn't find a pre-existing JIRA that already discusses this, so creating one.

Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.



  was:
Suppose that we have a delete row, and then followed by the put. The delete row
can mask the put, unless there was a major compaction in between.

Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
to differentiate weather or not the Put happened after the Delete and offer better 
delete semantics.

Couldn't find a pre-existing JIRA that already discusses this, so creating one.

Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quiet the same.



        Priority: Major  (was: Minor)
    Hadoop Flags: Incompatible change

This is a nice initiative.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221710#comment-13221710 ] 

Phabricator commented on HBASE-5241:
------------------------------------

tedyu has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:609 I think this suggestion is a good idea.

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208077#comment-13208077 ] 

Phabricator commented on HBASE-5241:
------------------------------------

stack has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

  Not sure I grok completely whats going on.  Where is the extra cost we pay in seeking?

  Good stuff Amit.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:74 This is ugly!
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:189 This used to be compare to zero.  Was it wrong?
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:195 Or, I suppose -1L now means what 0L used to?
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java:277 Why this change?
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java:845 We remove this assertions because delete behavior has changed?

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209114#comment-13209114 ] 

Phabricator commented on HBASE-5241:
------------------------------------

aaiyer has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java:277 Technically, this should have never passed with the old settings expectedResults = 2.

  It was passing due to a bug; which deleted version 0, whenever there was a delete for version 1. changing familyStamp to -1 should fix this.

  But, I'm having trouble convincing myself expectedResults = 3 is undoubtedly correct. It seems debatable. Any thoughts?

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208777#comment-13208777 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

@Lars. I see your point. Its definitely debatable, weather timestamp is something that should be exposed to the client (to control) or something that should be considered an internal detail (so mess up at your risk).

We do have applications that control timestamp; so we might need this (internally at least). Not sure if that is the only one in use, or there are more.

Wrt the WALs. In the current codebase, I believe that we do not write memstoreTS. But, that can be fixed if needed.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208769#comment-13208769 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

@stack. The potential performance slow down on seek is due to this:

In ScanQueryMatcher, we used to return getNextRowOrNextColumn(bytes, offset, qualLength) for FAMILY_DELETED and COLUMN_DELETED; because once we see a KV that is deleted due to a family or a column delete, all the remaining KV's (with a lower timestamp) are guaranteed to be deleted.

Now, we return SKIP instead. This change is required, because there might be a KV, later in the file -- that has a lower timestamp, but a higher memstoreTS (so that deleteFamily does not apply). In this case, we end up moving 1 KV at a time; instead of potentially skipping the entire column or row.




                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210076#comment-13210076 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

Need to think about HBASE-4536 as well. The main problem there was to work out when it is safe to delete the (family) delete markers - the solution was to store the smallest TS of any Put KV in the store file's metadata. I think that method should still work with your change.

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208772#comment-13208772 ] 

Amitanand Aiyer commented on HBASE-5241:
----------------------------------------

After some discussion, we figured that skipping to the next column, vs skipping to next KV shouldn't make so much of a difference. So, headed in this direction (as opposed to trying to create indices or change the sort order).


Will need to see once the patch is complete, if the read performance gets affected significantly.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5241:
-------------------------------

    Attachment: HBASE-5241.D1731.2.patch

aaiyer updated the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".
Reviewers: JIRA, Kannan, Karthik, stack, tedyu, nspiegelberg, jgray, lhofhansl, mbautin, gqchen

  the current state of affairs, supporting old behavior.

  will work more on this, once we hear from more folks on the directions we want to go.

REVISION DETAIL
  https://reviews.facebook.net/D1731

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSeekOptimizations.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanDeleteTracker.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/DeleteTracker.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209092#comment-13209092 ] 

Zhihong Yu commented on HBASE-5241:
-----------------------------------

I am in favor of internal timestamp so that we don't rely so much on real clock.
The initiative for this feature is to remove indeterminate behavior w.r.t. the timing of major compaction.

I am in support of this feature. We can turn it off by default.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210444#comment-13210444 ] 

Phabricator commented on HBASE-5241:
------------------------------------

aaiyer has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1748 This will only happen for Deletes (Column and Family). The idea is that the Delete shall apply to all the puts, with a lower memstoreTS, regardless of their timestamp -- even if it is in "future".

  Subsequent Puts etc. will not get masked by the Delete, because they should have a memstoreTS that is larger.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 This is not yet in production. But, if we decide to go down this route, we will definitely test it out for performance.

  Haven't optimised much here. Since, I don't expect there to be too many delete Family.

  Will revisit if the assumption turns out to be false.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 I'm not sure if we want to put this under ENFORCE_STRICTER_SEMANTICS ....

  my understanding was that it would be better to have Puts not be masked by previous Deletes, regardless ....

  weather we are willing to pay the extra performance cost for it, was the trade-off enforced using ENFORCE_STRICTER_SEMANTICS.

  If there is a good reason for clients to expect that the Put will be masked by previous Deletes, we can definitely guard this with the flag.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:173 Perhaps, I might rename this class to something different, and we can add a flag in ScanQueryMatcher to instantiate the appropriate DeleteTracker.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:223 Agree that this is going to be a performance issue here.

  But, this is just a V-1 to get the general idea out. I'm hopeful, we can optimise the codepath so that we incur the performance penalty only when there is really a later KV with a higher memstoreTS.

  We currently, do not have a way to tell that. But, it can be done, say dump a flag while writing the HFile, if there is a memstoreTS inversion. Or something along that lines ....

  Will try to optimise this, if needed, along those lines.

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209085#comment-13209085 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

If two things happen at the same time there *is* no right order. The fact that we have limited timer resolution is not that relevant here.

We just discussed another scenario internally here, were we have application level replicas of a table. One way to do this is have the client write to two HBase clusters and also have a catchup background task which copies older (before we started the dual-writing) cells to the replica. We will use this a lot to catch up standby clusters, etc. This would also not work any longer.

Anyway, if there are other committers that feel that we need this I won't veto it (but I am -0.5 on it). And it must configurable without any performance detriment when disabled (i.e. delete still seeks to the next column). I'd also vote to default off.

Maybe some of the other committers would like to comment? @Stack, @Ted, @Todd?

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209533#comment-13209533 ] 

Lars Hofhansl commented on HBASE-5241:
--------------------------------------

@Ted: Fair enough.
Here's another thing that will break: Master-Master replication. The memstoreTSs generated by the regionserver have no meaning w.r.t. to each other.
Also, since the replication sink accesses the replicated cluster through the normal API we need to add (public?) APIs to pass the memstoreTS through.

And I can already see folks who want to manipulate the memstoreTS from the outside, bringing us back to where we are.
                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210097#comment-13210097 ] 

Phabricator commented on HBASE-5241:
------------------------------------

lhofhansl has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

  I have voiced my concern amply in the jira :)

  Implementation-wise this is looks reasonable enough. See a few questions and comments inline.


INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1748 Is this right? Now we're always dating column or family way into the future.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:223 See HBASE-4926 on why this might be a performance problem.
  The seeking was just recently put in to address issues like this.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 Ugh... Although there shouldn't be too many family delete markers.
  Are you going to do some performance tests (or is this in production at FB already?)
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 This needs to conditioned on ENFORCE_STRICTER_SEMANTICS, right?
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:166 Same here
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:173 and here

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Amitanand Aiyer (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amitanand Aiyer reassigned HBASE-5241:
--------------------------------------

    Assignee: Amitanand Aiyer
    
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5241:
-------------------------------

    Attachment: HBASE-5241.D1731.3.patch

aaiyer updated the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".
Reviewers: JIRA, Kannan, Karthik, stack, tedyu, nspiegelberg, jgray, lhofhansl, mbautin, gqchen

  address comments from ted.

REVISION DETAIL
  https://reviews.facebook.net/D1731

AFFECTED FILES
  
  
  
  
  
  
  
  
  
  
  
  
  
  

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209142#comment-13209142 ] 

Phabricator commented on HBASE-5241:
------------------------------------

tedyu has commented on the revision "HBASE-5241 [jira] Deletes should not mask Puts that come after it.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/HConstants.java:402 Put HBASE-5241 here.
  src/main/java/org/apache/hadoop/hbase/HConstants.java:408 Should this be turned off by default ?
  src/main/java/org/apache/hadoop/hbase/regionserver/DeleteTracker.java:45 Add @param for memstoreTS in these two methods.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1750 If there is no better way of handling, remove this line.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1763 This special constant should be defined and documented.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:49 This member doesn't seem to be used anywhere.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:50 This neither.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:66 Would memstoreTSForDelete be a better name ?
  Add comment please.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:65 Would timestampForDelete be better name ?
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:89 Add @param for memstoreTS.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:93 Indentation for these two lines.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:60 Add comments for these fields please
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:62 How about naming this field memstoreTSForDeleteCol ?
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:118 Should we consider checking these two timestamps separately ?

REVISION DETAIL
  https://reviews.facebook.net/D1731

                
> Deletes should not mask Puts that come after it.
> ------------------------------------------------
>
>                 Key: HBASE-5241
>                 URL: https://issues.apache.org/jira/browse/HBASE-5241
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Amitanand Aiyer
>         Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch
>
>
> Suppose that we have a delete row, and then followed by the put. The delete row
> can mask the put, unless there was a major compaction in between.
> Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able
> to differentiate whether or not the Put happened after the Delete and offer better 
> delete semantics.
> Couldn't find a pre-existing JIRA that already discusses this, so creating one.
> Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira