You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Clint Morgan (JIRA)" <ji...@apache.org> on 2008/10/01 22:35:44 UTC

[jira] Created: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Scanner misses columns / rows when the scanner is obtained durring a memcache flush
-----------------------------------------------------------------------------------

                 Key: HBASE-910
                 URL: https://issues.apache.org/jira/browse/HBASE-910
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
         Environment: latest trunk
            Reporter: Clint Morgan
            Priority: Critical
         Attachments: hbase-910.patch

I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640882#action_12640882 ] 

stack commented on HBASE-910:
-----------------------------

Good one Jim for digging in.

Don't you think it critical that we fix the hole Jim?  That it become a blocker?  Otherwise, scanners get different view of the data dependent upon when they run?

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "Clint Morgan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Clint Morgan updated HBASE-910:
-------------------------------

    Attachment: hbase-910.patch

patch to provoke the issue. The relevant piece of the log is:

2008-10-01 13:25:32,728 INFO org.apache.hadoop.hbase.TestScannerWhileMemcacheFlush: got scanner
2008-10-01 13:25:32,792 INFO org.apache.hadoop.hbase.TestScannerWhileMemcacheFlush: got scanner
2008-10-01 13:25:32,856 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region table,,1222892721305. Current region memcache size 25.3k
2008-10-01 13:25:32,857 INFO org.apache.hadoop.hbase.TestScannerWhileMemcacheFlush: got scanner
2008-10-01 13:25:32,995 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Added /user/clint.morgan/table/215004990/family/mapfiles/2754371528337048417 with 1000 entries, sequence id 1012, data size 25.3k, file size 39.4k
2008-10-01 13:25:32,996 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region table,,1222892721305 in 141ms, sequence id=1012, compaction requested=false
2008-10-01 13:25:32,997 WARN org.apache.hadoop.hbase.TestScannerWhileMemcacheFlush: Failing assert


> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-910:
------------------------

    Attachment: 910-v2.patch

Adds test to replicate and the the fix.

This turns out to be a case we've handled elsewhere over in compactions.  When we finish a compaction and there are outstanding scanners, we slot the change in Store readers in under the running scanner.  Here, whats happening is that memcache contents are flushed and a new store file is created but outstanding scanners were set up with a memcache scanner only; when the new flush file is added, outstanding scanners are blind to its content -- but not to the fact that the memcache has been rotated out.

I exploited the mechanism that worked for compaction adding in a store file scanner to outstanding scanner when new reader is added.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 910-v2.patch, hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719833#action_12719833 ] 

stack commented on HBASE-910:
-----------------------------

Chatting more with Ryan, a simple fix to get around case where on a snapshot, scanners will miss what has been moved from memcache to snapshot would be the following:

{code}
durruti:cleantrunk stack$ svn diff src
Index: src/java/org/apache/hadoop/hbase/regionserver/Memcache.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/Memcache.java (revision 785009)
+++ src/java/org/apache/hadoop/hbase/regionserver/Memcache.java (working copy)
@@ -539,17 +539,14 @@
   }
 
   /**
-   * @return scanner on memcache and snapshot in this order (if snapshot is
-   * empty, returns only memcache scanner).
+   * @return scanner on memcache and snapshot in this order.
    */
   KeyValueScanner [] getScanners() {
     this.lock.readLock().lock();
     try {
-      boolean noss = this.snapshot == null || this.snapshot.isEmpty();
-      KeyValueScanner [] scanners =
-        new KeyValueScanner[noss? 1: 2];
+      KeyValueScanner [] scanners = new KeyValueScanner[2];
       scanners[0] = new MemcacheScanner(this.memcache);
-      if (!noss) scanners[1] = new MemcacheScanner(this.snapshot);
+      scanners[1] = new MemcacheScanner(this.snapshot);
       return scanners;
     } finally {
       this.lock.readLock().unlock();
durruti:cleantrunk stack$ svn up
At revision 785013.
durruti:cleantrunk stack$ svn diff src
Index: src/java/org/apache/hadoop/hbase/regionserver/Memcache.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/Memcache.java (revision 785013)
+++ src/java/org/apache/hadoop/hbase/regionserver/Memcache.java (working copy)
@@ -539,17 +539,14 @@
   }
 
   /**
-   * @return scanner on memcache and snapshot in this order (if snapshot is
-   * empty, returns only memcache scanner).
+   * @return scanner on memcache and snapshot in this order.
    */
   KeyValueScanner [] getScanners() {
     this.lock.readLock().lock();
     try {
-      boolean noss = this.snapshot == null || this.snapshot.isEmpty();
-      KeyValueScanner [] scanners =
-        new KeyValueScanner[noss? 1: 2];
+      KeyValueScanner [] scanners = new KeyValueScanner[2];
       scanners[0] = new MemcacheScanner(this.memcache);
-      if (!noss) scanners[1] = new MemcacheScanner(this.snapshot);
+      scanners[1] = new MemcacheScanner(this.snapshot);
       return scanners;
     } finally {
       this.lock.readLock().unlock();
{code}

In above, we always have open scanner on memcache and snapshot.  Usual case is snapshot is empty so nothing comes from here but if a snapshot happened, scanner would start getting its answers here rather than from memcache.  Closes a hole.  Don't need to call updateReaders.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 910-v2.patch, hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-910:
---------------------------

    Assignee: stack

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648707#action_12648707 ] 

stack commented on HBASE-910:
-----------------------------

Test no longer fails because depended on optional flush which has since been removed.  Trying to rereplicate.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640877#action_12640877 ] 

Jim Kellerman commented on HBASE-910:
-------------------------------------

When a cache flush starts, it uses the current state of the cache as a snapshot, and creates a new cache so that updates will not be blocked during the flush.

A scanner can (and does) use that snapshot until the snapshot is fully written to disk. When the flush is complete the snapshot is deleted to reclaim memory. If the scanner had not yet reached the end of the snapshot before it is deleted, Some rows may be missed in the interval between the flush completes and the scanner is made aware of the new file that was just created.

Closing this window would be very difficult for a couple of reasons:
- holding on to the snapshot until the scanner is done would increase memory pressure on the region server which could lead to the region server running out of memory
- it is inherently difficult to close the window between cache flush completion and notification of the scanner as each is running in a different thread. Increasing synchronization would lead to an overall degredation in performance.

It is my impression that Bigtable does not allow client reads to "see" concurrent updates as each row in the memtable is copy on write so that reads and writes proceed in parallel.

In short, this is a very thorny issue, and HBase is much better in this respect than it was. Scanners used to get a view of the data as it was when the scanner was taken out which did not include data from flushes in progress and did not pick up new files created by cache flushes (at all) when the flush completed. Fixing this issue would require a significant amount of work.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640886#action_12640886 ] 

Jim Kellerman commented on HBASE-910:
-------------------------------------

Should the hole be fixed? Yes.

Is it a blocker for 0.18.1? No I don't think so because this behavior has been around for quite some time, and fixing this problem will be time consuming and consequently hold up other critical fixes that are in 0.18.1 which people really need.

I would make it a blocker for 0.19.0 and 0.18.2, though.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-910:
--------------------------------

    Fix Version/s: 0.19.0
         Priority: Blocker  (was: Critical)

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "Clint Morgan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640893#action_12640893 ] 

Clint Morgan commented on HBASE-910:
------------------------------------

Yeah thats fine by me. Thanks for looking into this.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640890#action_12640890 ] 

stack commented on HBASE-910:
-----------------------------

Agreed Jim.... blocker for 0.19.0.

Clint, you down with that?

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Priority: Critical
>         Attachments: hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-910) Scanner misses columns / rows when the scanner is obtained durring a memcache flush

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-910.
-------------------------

    Resolution: Fixed

Committed.

> Scanner misses columns / rows when the scanner is obtained durring a memcache flush
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-910
>                 URL: https://issues.apache.org/jira/browse/HBASE-910
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>         Environment: latest trunk
>            Reporter: Clint Morgan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 910-v2.patch, hbase-910.patch
>
>
> I first noticed that some columns for a row were missing if they are coming from a scanner that was obtained while a memecache flush on the region was in progress. I tried to write a simple unit test to reproduce, however the problem I get in the unit test is that some rows are being missed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.