You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Prakash Khemani (JIRA)" <ji...@apache.org> on 2011/05/02 22:01:04 UTC

[jira] [Created] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

data loss because lastSeqWritten can miss memstore edits
--------------------------------------------------------

Key: HBASE-3845
URL: https://issues.apache.org/jira/browse/HBASE-3845
Project: HBase
Issue Type: Bug
Reporter: Prakash Khemani

(I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)

In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.

After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.

HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore.

Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.

step 1:
flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().

step 2 :
as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.

step 3 :
wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.

step 4:
the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.

as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan reassigned HBASE-3845:
---------------------------------------------

    Assignee: ramkrishna.s.vasudevan

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845__trunk.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069107#comment-13069107 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

Patch doesn't apply cleanly on 0.90:
{noformat}
patching file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
Hunk #1 FAILED at 40.
Hunk #2 succeeded at 131 (offset -1 lines).
Hunk #4 succeeded at 871 (offset -28 lines).
Hunk #6 succeeded at 1167 (offset -27 lines).
1 out of 7 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
{noformat}

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090756#comment-13090756 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

{code}
+      if (wal != null) 
+        wal.abortCacheFlush(this.regionInfo.getEncodedNameAsBytes());
{code}
Pls uses braces as there is a second line.


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090740#comment-13090740 ] 

gaojinchao commented on HBASE-3845:
-----------------------------------

@RAM
I have run all the unit tests, Please help to review it firstly. Thanks.


I will construct the scene to verify today.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090881#comment-13090881 ] 

gaojinchao commented on HBASE-3845:
-----------------------------------

I verified the patch. I think it is ok.
I created a table(one regoin) and put a lot of data. The log said that seq is continuous.
code :
      // updateLock not needed for removing snapshot's entry
      // Cleaning up of lastSeqWritten is in the finally clause because we
      // don't want to confuse getOldestOutstandingSeqNum()
      this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
      Long seq = this.lastSeqWritten.get(encodedRegionName);
      if (null != seq) {
        LOG.error("gjc: end flush seq " + logSeqId + "current seq" + seq);
      } else {
        LOG.error("gjc: end flush seq " + logSeqId);
      }
logs:
2011-08-25 04:11:50,807 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start flush seq495032
2011-08-25 04:11:50,808 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start flush seq495032current seq499908
2011-08-25 04:12:11,073 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc: end flush seq 499908current seq499909
2011-08-25 04:12:11,700 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start flush seq499909
2011-08-25 04:12:11,700 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start flush seq499909current seq505058
2011-08-25 04:12:58,532 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc: end flush seq 505058current seq505059
2011-08-25 04:12:58,784 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start flush seq505059

The logs before the patch:
2011-08-25 05:35:20,691 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start seq679214
2011-08-25 05:35:20,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:end current seq679215
2011-08-25 05:36:19,024 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start seq682145
2011-08-25 05:36:26,928 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:end current seq685931
2011-08-25 05:36:27,571 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start seq686209
2011-08-25 05:36:36,311 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:end current seq690191
2011-08-25 05:36:36,768 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start seq690244
2011-08-25 05:36:44,709 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:  gjc:end current seq693566
2011-08-25 05:36:45,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: gjc:start seq694126

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070614#comment-13070614 ] 

Prakash Khemani commented on HBASE-3845:
----------------------------------------

In the method internalFlushcache() I don't see updatesLock.writeLock() being held around the following piece of code.

{code}
    if (wal != null) {
      wal.completeCacheFlush(this.regionInfo.getEncodedNameAsBytes(),
        regionInfo.getTableDesc().getName(), completeSequenceId,
        this.getRegionInfo().isMetaRegion());
    }
{code}

==

I will upload the internal patch for reference ...





> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-3845:
------------------------------

    Attachment: HBASE-3845_branch90V2.patch

According to review, modified the code.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070374#comment-13070374 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

Ted,
I tried using the 'this.cacheFlushLock.isHeldByCurrentThread()'.
The problem here is as HLog.append() may be called by other thread whereas the HRegion.internalFlushCache() is called by memstoreflusher thread.
So if we check this.cacheFlushLock.isHeldByCurrentThread() it returns false.

So as per your suggestion i have inlined the isFlushInProgress into wal.startCacheFlush() and wal.abortCacheFlush() and still going with AtomicBoolean.
Is it fine Ted ? I am planning to upload the patch with these changes.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069657#comment-13069657 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

Thanks Ted for your comments.
{noformat}
Can we fold wal.setFlushInProgress() into wal.startCacheFlush() and wal.abortCacheFlush() to make the code cleaner ?
{noformat}
I think we may have to reset the atomic boolean even if exception happens like in completeCacheFlush or anywhere before it.
So only I did it with a try/finally block as per Stack's comments.

{noformat}
Actually we can check whether the current thread owns cacheFlushLock
{noformat}
I checked the link. The ReentrantLock.getOwner() api is protected.  So to check if cacheFlushLock is acquired by the current thread we have to make cacheFlushLock as a class that extends ReentrantLock.
But if we can do this then we can avoid the Atomic Boolean.  
Correct me if am wrong.  
Please give your comments if any changes are needed.


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069385#comment-13069385 ] 

dhruba borthakur commented on HBASE-3845:
-----------------------------------------

Hi Prakash, would you like to review this one?

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071252#comment-13071252 ] 

Hudson commented on HBASE-3845:
-------------------------------

Integrated in HBase-TRUNK #2051 (See [https://builds.apache.org/job/HBase-TRUNK/2051/])
    HBASE-3845 data loss because lastSeqWritten can miss memstore edits

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087534#comment-13087534 ] 

stack commented on HBASE-3845:
------------------------------

Ok.  I wrote Gao to suggest he figure out what was finally applied to branch here, make a version of it for 0.90, test it, and apply the file here.  I'll commit it then.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3845:
-------------------------

    Fix Version/s:     (was: 0.90.4)
                   0.90.5

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073159#comment-13073159 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

Applied to TRUNK.
TestResettingCounters passes now.

Thanks for the patch Anirudh.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Patch Available  (was: Open)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068881#comment-13068881 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

I have submitted the patch for 0.90.4 version.
I ran all the testcases.  It was ok.
But automating this particular scenario through test case was not feasible.
Please review and provide your comments. 
I can submit for trunk also. 

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_trunk_3.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_2.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069549#comment-13069549 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

Can we fold wal.setFlushInProgress() into wal.startCacheFlush() and wal.abortCacheFlush() to make the code cleaner ?

Actually we can check whether the current thread owns cacheFlushLock (see http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html#getOwner%28%29) so that we don't need to introduce the new AtomicBoolean.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187842#comment-13187842 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

@Ted
Tomorrow we will be uploading a patch. 
                
> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_4.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071251#comment-13071251 ] 

stack commented on HBASE-3845:
------------------------------

I applied patch to trunk.   Waiting till 0.90.4 clears the blocks before applying to 0.90.5.  Thanks for the patches Prakash and Ramakrishna.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Fix Version/s: 0.90.4
           Status: Patch Available  (was: Open)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187778#comment-13187778 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

@Stack
This issue has to be merged to 0.90.  We faced the same problem in our cluster.

                
> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069405#comment-13069405 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

@Stack,
Thanks for the comments.
As wal.completeCacheFlush() may throw exception its better to reset  wal.setFlushInProgress(false); in try/finally block.  
I will also add a method for 
{noformat}
      if (isFlushInProgress.get()) {
        this.seqWrittenWhileFlush.putIfAbsent(hriKey, seqNum);
      } else {
        this.lastSeqWritten.putIfAbsent(hriKey, seqNum);
      }
{noformat}
I will fix and resubmit it for both trunk and 0.90


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Open  (was: Patch Available)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-3845:
--------------------------------

    Attachment: HBASE-3845-fix-TestResettingCounters-test.txt

Hi folks - I have been working with Prakash

The patch I have submitted should fix the issue with TestResettingCounters failing

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Vlad Dogaru (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072082#comment-13072082 ] 

Vlad Dogaru commented on HBASE-3845:
------------------------------------

This patch seems to break TestResettingCounters for me.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070545#comment-13070545 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

@Prakash:
Would you be able to share your patch ?

>> The bigger problem here is that completeCacheFlush() is not called with updatedLock acquired.
See line 1154 in HLog:
{code}
      synchronized (updateLock) {
{code}

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prakash Khemani updated HBASE-3845:
-----------------------------------

    Attachment: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch

patch deployed internally in facebook

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027816#comment-13027816 ] 

stack commented on HBASE-3845:
------------------------------

The scenario you describe seems plausible Prakash.  Let me up the priority of this issue.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3845:
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_5.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070860#comment-13070860 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

+1 on HBASE-3845_trunk_3.patch

Ran unit tests and they passed.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090882#comment-13090882 ] 

gaojinchao commented on HBASE-3845:
-----------------------------------

@Stack
Please review the patch and give some suggestion. :)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069662#comment-13069662 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

I should have looked further down the API list:
{code}
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html#isHeldByCurrentThread%28%29
{code}

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Patch Available  (was: Open)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072189#comment-13072189 ] 

Prakash Khemani commented on HBASE-3845:
----------------------------------------

I agree that the test case is OK. We should change the HLog code to account for the case that lastSeqWritten might not be updated if WAL is not being written to.



> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3845:
--------------------------

    Fix Version/s:     (was: 0.90.5)
                   0.92.0

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070470#comment-13070470 ] 

Ted Yu commented on HBASE-3845:
-------------------------------

That is fine. 

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_6.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069383#comment-13069383 ] 

stack commented on HBASE-3845:
------------------------------

Ram:

After setting wal.setFlushInProgress(true);, should we then go into a try/finally so we for sure clear this state if we get an exception before we get to the wal.setFlushInProgress(false);?

We do this in two places:

{code}
+        this.seqWrittenWhileFlush.putIfAbsent(hriKey, seqNum);
+      } else {
+        this.lastSeqWritten.putIfAbsent(hriKey, seqNum);
{code}

Maybe make a method?  Also, it seems a little odd that the first time we do the above, we pass a long to seqWrittenWhileFlush then to lastSeqWritten, we pass Bytes.toLong(seqNum) to lastSeqWritten but above, we pass the same value to both.  Is this right?

Otherwise, patch looks great.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Open  (was: Patch Available)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073168#comment-13073168 ] 

Hudson commented on HBASE-3845:
-------------------------------

Integrated in HBase-TRUNK #2064 (See [https://builds.apache.org/job/HBase-TRUNK/2064/])
    HBASE-3845 Addendum: relax lastSeqWritten check in case write to WAL is skipped

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_1.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3845:
-------------------------

             Priority: Critical  (was: Major)
    Affects Version/s: 0.90.3

Filed against 0.90.3 and made critical.

Any test to demo behavior P?

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Priority: Critical
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-3845:
------------------------------

    Attachment: HBASE-3845_branch90V1.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187820#comment-13187820 ] 

Zhihong Yu commented on HBASE-3845:
-----------------------------------

@Jinchao:
I don't see getSnapshotName() in HLog.java under 0.90

Can you attach a complete patch ?
                
> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_branch90V1.patch, HBASE-3845_branch90V2.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027823#comment-13027823 ] 

dhruba borthakur commented on HBASE-3845:
-----------------------------------------

Good finding Prakash!

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Priority: Critical
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Open  (was: Patch Available)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087528#comment-13087528 ] 

stack commented on HBASE-3845:
------------------------------

@Gaojinchao Do you need this on branch?

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Status: Patch Available  (was: Open)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069669#comment-13069669 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

@Ted,
Thanks.. I too missed it.. Sorry.  Will prepare patch verify and resubmit it asap.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845__trunk.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069380#comment-13069380 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

@Ted,
I think the file got changed after the patch was prepared.
Sorry for that.  I have resubmitted the patch once again.
Thanks.

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071086#comment-13071086 ] 

Prakash Khemani commented on HBASE-3845:
----------------------------------------

patch deployed internally in facebook 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-3845:
------------------------------------------

    Attachment: HBASE-3845_trunk_2.patch

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070548#comment-13070548 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

Thank you very much for the comments Prakash.
One query:
{noformat}
The bigger problem here is that completeCacheFlush() is not called with updatedLock acquired. Therefore there might still be correctness issues with the latest patch.
{noformat}

As per the current code the completeCacheFlush() has acquired the updateLock. Only the sync() and finally block is out of the lock. So can you please elaborate on the correctness issue?
Other 2 comments i can implement.





> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087530#comment-13087530 ] 

Jieshan Bean commented on HBASE-3845:
-------------------------------------

Yes, we need this patch on branch:)

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086805#comment-13086805 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

Yes Gao.  The fix is not gone into 0.90.x version.  its available in trunk only.  

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072186#comment-13072186 ] 

ramkrishna.s.vasudevan commented on HBASE-3845:
-----------------------------------------------

The testcase TestResettingCounters is failing because as per the test case all the increments operation that we do is not written to wal.
But when we do a cache flush we call wal.startCacheFlush() where we check 
'Long seq = this.lastSeqWritten.remove(encodedRegionName)'
 is null or not.
If null we throw error and halt the system.
In this testcase whereever we call region.increment
'for (int i=0;i<5;i++) region.increment(odd, null, false);'
we pass false for write to WAL.  Hence this problem occurs.  So we can correct this test case by passing true instead of false and i verified the same.  
But i think we shouldnot halt the system in this case. We can change this behaviour 
Correct me if my analysis is wrong?

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071250#comment-13071250 ] 

stack commented on HBASE-3845:
------------------------------

Nice explanatory comments.  This is radical '+      Runtime.getRuntime().halt(1);' but I can live with it (should never happen it seems).  getSnapshotName could use Bytes utility copying bytes but its fine as is.

I'm game for applying this version.  The patches do similar but this is a little more thorough with more explanation.  Sounds like it got a bit of airing on a real cluster too.



> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070542#comment-13070542 ] 

Prakash Khemani commented on HBASE-3845:
----------------------------------------

In the patch that is deployed internally we have implemented a different approach. We remove the region's entry in startCacheFlush() and save it (as opposed to the current behavior of removing the entry in completeCacheFlush()). If the flush aborts then we restore the saved entry.

The approach taken in the latest patch in this jira might also be OK. I have a few comments

{noformat}
           this.lastSeqWritten.remove(encodedRegionName);
+          Long seqWhileFlush = this.seqWrittenWhileFlush.get(encodedRegionName);
+          if (null != seqWhileFlush) {
+            this.lastSeqWritten.putIfAbsent(encodedRegionName, seqWhileFlush);
+            this.seqWrittenWhileFlush.remove(encodedRegionName);
+   
{noformat}

seqWrittenWhileFlush .get() and subsequent .remove() can be replaced by a single .remove()
{code}
Long seqWhileFlush = this.seqWrittenWhileFlush.remove(encodedRegionName);
if (null != seqWhileFlush) {
  lSW.put(encodedRegionName, seqWhileFlush);
else
  lSW.remove(encodedRegionName);
{code}

==
The bigger problem here is that completeCacheFlush() is not called with updatedLock acquired. Therefore there might still be correctness issues with the latest patch.

==

{noformat}
   public void abortCacheFlush() {
+    this.isFlushInProgress.set(false);
     this.cacheFlushLock.unlock();
   }
{noformat}
shouldn't seqWrittenWhileFlush be cleaned up in abort cache?


> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086728#comment-13086728 ] 

gaojinchao commented on HBASE-3845:
-----------------------------------

Hi,Patch has not yet apply to the branch ?  

> data loss because lastSeqWritten can miss memstore edits
> --------------------------------------------------------
>
>                 Key: HBASE-3845
>                 URL: https://issues.apache.org/jira/browse/HBASE-3845
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Prakash Khemani
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch
>
>
> (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.)
> In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably.
> After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore.
> HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track  of the earliest log-sequence-number that is present in the memstore.
> Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens.
> step 1:
> flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().
> step 2 :
> as soon as the updatesLock.writeLock() is released new entries will be added into the memstore.
> step 3 :
> wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten.
> step 4:
> the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing.
> ==
> as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira