You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (Created) (JIRA)" <ji...@apache.org> on 2012/03/13 05:41:43 UTC

[jira] [Created] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
-----------------------------------------------------------------------

                 Key: HBASE-5569
                 URL: https://issues.apache.org/jira/browse/HBASE-5569
             Project: HBase
          Issue Type: Bug
            Reporter: Lars Hofhansl
            Priority: Minor


What I pieces together so far is that it is the *scanning* side that has problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234829#comment-13234829 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-TRUNK #2689 (See [https://builds.apache.org/job/HBase-TRUNK/2689/])
    HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232365#comment-13232365 ] 

Hadoop QA commented on HBASE-5569:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518850/5569-v3.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestCompaction
                  org.apache.hadoop.hbase.TestKeyValue

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1220//console

This message is automatically generated.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229552#comment-13229552 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

@Ted: Probably. Or make it a large test.
I'll leave the test running in a loop for the rest of the day before I conclude anything. There might just be lower concurrency now and hence the problem is less likely to see.
BTW. On my machine at home the time went from 70s to 400s.

I assume we'd see the same in a test with a CF with VERSIONS=1 and then we put and scan in parallel. After HBASE-2856 went in, these puts could not be collected at flush time as they are used in a scan, now with this change the same happens for deletes.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232896#comment-13232896 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Ran more variations of the test (different number of threads, loops, synchronized flushing or not). Each time I see a failure after 2-3 runs without the patch, and no failures with the patch after at least 20 iterations.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228993#comment-13228993 ] 

Zhihong Yu commented on HBASE-5569:
-----------------------------------

@Chunhui:
This makes sense.

Looks like the test case can utilize HBASE-5515: Add a processRow API that supports atomic multiple reads and writes on a row
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230428#comment-13230428 ] 

stack commented on HBASE-5569:
------------------------------

@Lars Good one.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232341#comment-13232341 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I spent a lot of more time looking at this. I thought it might be due to the flushes being executed in parallel by multiple threads, but synchronizing this part made the failure more likely!
Doing this and increasing the frequency of flushes reproduces the problem multiple times on every test run now, which it good.

But... My initial hunch was correct. When I enable KEEP_DELETED_CELLS on the store the problem goes away!
Hence this definitely has to do with collection of deletes and delete markers.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231885#comment-13231885 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-0.94 #38 (See [https://builds.apache.org/job/HBase-0.94/38/])
    Revert HBASE-5569 (Revision 1301873)

     Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228205#comment-13228205 ] 

stack commented on HBASE-5569:
------------------------------

bq. This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes 

Or is this a bug we've introduced recently?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Attachment: 5569.txt

Here's the patch. Still running tests in a loop, no failure, yet.
Attaching here, so that I can get a HadoopQA run.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230537#comment-13230537 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-TRUNK #2683 (See [https://builds.apache.org/job/HBase-TRUNK/2683/])
    HBASE-5569  Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231815#comment-13231815 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Ran testRowMutationMultiThreads another 1000 times on my work machine without any failures.
Then I ran it at home (much slower machine - but fast SSD) and saw a failure indeed pretty quickly. Hmm...

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232382#comment-13232382 ] 

nkeywal commented on HBASE-5569:
--------------------------------

I've got the testRowMutationMultiThreads running currently on the patch v3. No issue so far. I will make it run 5000 times, previously it always failed before 1000 iterations.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232390#comment-13232390 ] 

nkeywal commented on HBASE-5569:
--------------------------------

Right now it's still running well. I'm doing the test on a small server, with a 4 core Intel Xeon E3-1220.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229028#comment-13229028 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

But note that sometime the other case happens, and we two rows!
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232393#comment-13232393 ] 

Hadoop QA commented on HBASE-5569:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518856/5569-v4.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1221//console

This message is automatically generated.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228998#comment-13228998 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Well... The whole point of the new API was to have atomic operations.
The Put and the Delete are executed atomically together and visible at the same time.
Note that the code alternates putting row and deleting row2, and then putting row2 and deleting row. The scan than ensure that only exactly one column is visible.

In this case the scan *itself* is inconsistent. And worse, as Nicolas (N) found out is that even testRowMutationMultiThreads fails sometimes, and that is just a single row and should never happen.

So I am not entirely convinced the test is at fault.

For example the scenario described above if Between the time thread1 execute
if
{code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);
{code}
happened between 
{code}
region.mutateRowsWithLocks(mrm, rowsToLock);
{code}

and
{code}

Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));
{code}

Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC writepoint. So the scan will now see the other row.
This has nothing to do with race conditions between threads, but only occurs with flushes in the test. I'll remove the forced flushes and then run the test again.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232366#comment-13232366 ] 

stack commented on HBASE-5569:
------------------------------

Nice work Lars.  Will review/test tomorrow.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl reopened HBASE-5569:
----------------------------------

    
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229028#comment-13229028 ] 

Lars Hofhansl edited comment on HBASE-5569 at 3/14/12 6:48 AM:
---------------------------------------------------------------

But note that sometime the other case happens, and we see two rows!
                
      was (Author: lhofhansl):
    But note that sometime the other case happens, and we two rows!
                  
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to 0.94 and trunk
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229813#comment-13229813 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Will run tests over night and commit tomorrow morning unless I see a test failure or I get any objections.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Assignee: Lars Hofhansl
      Status: Patch Available  (was: Open)
    
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228200#comment-13228200 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Here's the other case.
{quote}
2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6
{quote}

Locally I have not been able to reproduce this, yet.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieces together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228204#comment-13228204 ] 

stack commented on HBASE-5569:
------------------------------

This is the failures we saw up on builds.apache.org?  There was a fail in hadoopqa too.  You including that?  Good on you Lars.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232513#comment-13232513 ] 

nkeywal commented on HBASE-5569:
--------------------------------

I stopped it after 2700 iterations (10 hours), no error => patch seems to be fix the issue...
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230913#comment-13230913 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-TRUNK-security #139 (See [https://builds.apache.org/job/HBase-TRUNK-security/139/])
    HBASE-5569  Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301135)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Status: Patch Available  (was: Open)
    
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232184#comment-13232184 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-TRUNK-security #141 (See [https://builds.apache.org/job/HBase-TRUNK-security/141/])
    Revert HBASE-5569 (Revision 1301872)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231680#comment-13231680 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Hmm... I ran the tests (all of them - including testRowMutationMultiThreads) over 4000 times, didn't fail.
testMultiRowMutationMultiThreads is definitely fixed (failed after a few dozen executions before).

There might be yet another much rarer problem with testRowMutationMultiThreads. I've never seen it fail on the build machines, yet.

Any chance you could attach the latest logs (as zip or tar)?

Btw, this:
{code}
2012-03-14 03:14:02,146 DEBUG [Thread-51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE
Exception in thread "Thread-51" 
junit.framework.AssertionFailedError
	at junit.framework.Assert.fail(Assert.java:48)
	at junit.framework.Assert.fail(Assert.java:56)
	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307)
2012-03-14 03:14:02,228 DEBUG [Thread-92] regionserver.TestAtomicOperation$1(279): flushing
{code}
Is just when the test detects the problem. The actual problem should be in the logs some time before that.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------


Committed to 0.94 and trunk. Pheeww.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233682#comment-13233682 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Thanks. Going to commit soon.
@Stack: wanna have a quick look (also at my comment from 19/Mar/12 15:50)? 
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228206#comment-13228206 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

A bit more context:

One case:
{quote}
2012-03-12 21:48:49,497 INFO  [Thread-260] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/target/test-data/e923fe0e-3b3e-4c67-89ec-4cac8c991955/TestIncrementtestMultiRowMutationMultiThreads/testtable/446f80b650aa093734c2dff4b9581ff8/colfamily11/e0930b6c478c4a5db9eceaead90bc80e, entries=7, sequenceid=75545, filesize=1.0k
2012-03-12 21:48:49,522 INFO  [Thread-260] regionserver.HRegion(1552): Finished memstore flush of ~87.4k/89544, currentsize=20.8k/21320 for region testtable,,1331588915162.446f80b650aa093734c2dff4b9581ff8. in 63ms, sequenceid=75545, compaction requested=true
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.TestAtomicOperation$2(390): []
Exception in thread "Thread-211" junit.framework.AssertionFailedError	at junit.framework.Assert.fail(Assert.java:48)
	at junit.framework.Assert.fail(Assert.java:56)
	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)
{quote}

Another case:
{quote}
2012-03-13 01:34:06,655 INFO  [Thread-212] regionserver.Store(748): Flushed , sequenceid=56173, memsize=1.8k, into tmp file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2
2012-03-13 01:34:06,656 DEBUG [Thread-212] regionserver.Store(773): Renaming flushed file at /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/.tmp/a0e1d5df9b5344c19ddbc7b11e0cd9d2 to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2
2012-03-13 01:34:06,661 INFO  [Thread-212] regionserver.Store(796): Added /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/48f0d653-d644-41be-80ff-90e726af10d4/TestIncrementtestMultiRowMutationMultiThreads/testtable/00fad569500db871769b9d5951b3ed16/colfamily11/a0e1d5df9b5344c19ddbc7b11e0cd9d2, entries=11, sequenceid=56173, filesize=1.2k
2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/56043/DeleteColumn/vlen=0,and after = rowB/colfamily11:qual1/54931/Put/vlen=6
2012-03-13 01:34:06,674 DEBUG [Thread-287] regionserver.TestAtomicOperation$2(390): [rowA/colfamily11:qual1/56043/Put/vlen=6, rowB/colfamily11:qual1/54931/Put/vlen=6]
Exception in thread "Thread-287" junit.framework.AssertionFailedError	at junit.framework.Assert.fail(Assert.java:48)
	at junit.framework.Assert.fail(Assert.java:56)
	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)
2012-03-13 01:34:06,675 INFO  [Thread-212] regionserver.HRegion(1552): Finished memstore flush of ~380.5k/389664, currentsize=28.5k/29192 for region testtable,,1331602436835.00fad569500db871769b9d5951b3ed16. in 44ms, sequenceid=56173, compaction requested=true
{quote}

So it seems this is related to flushing (test test flushes frequently - 1/s - precisely to exercise this scenario)

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229498#comment-13229498 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

HBASE-2856 added logic about when KVs can be expired (either by version or TTL), it did not add this same logic for deleted rows (i.e. for deletes the rug can be pulled from under a scan).
I added that (which ended up being a one line change once I understood what is going on). Running test in a loop now.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228568#comment-13228568 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

The check that issues the above DEBUG message was added as part of HBASE-5121.
Interestingly that issue is only about major compactions, and this test does not have any major compactions, so maybe HBASE-5121 is incorrect?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229523#comment-13229523 ] 

Zhihong Yu commented on HBASE-5569:
-----------------------------------

+1 on patch.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Attachment: 5569-v2.txt

Same change.
In addition reduced number of threads to 50 and number of iterations to 500 to bring test runtimes to about 15s (on my machine).
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229703#comment-13229703 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

If I remove the scanning from the tests the times are back to what it was before, suggesting that the extra work is due to keeping (and flushing) deleted cells that cannot be collected because they are part of a scan.

I'm happy with this outcome, and I would like to commit this change. 
Ted +1'd, I'm +1 (obviously), but it wouldn't hurt to have another pair of eyes or two looking at this.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Attachment: 5569-v3.txt

New patch.
Also adds code to show the memstoreTS in KV.toString.
The number of loops on the TestAtomicOperation was reduced and the number of a flushes increased.

Please have a careful look.
If possible if some other folks could run TestAtomicOperation in a loop for a while that would be very helpful (considering that this problem did not occur at all on my work machine).

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230295#comment-13230295 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

3500 test runs, no failures. Going to commit if nobody objects.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229018#comment-13229018 ] 

chunhui shen commented on HBASE-5569:
-------------------------------------

@Lars
Maybe I don't say clearly.

We could consider the following scenario:

Time 1,Thread 1, row is deleted and row2 is put, so now in the hbase, the real KV is only row2

Time 2,Thread 1, do RegionScanner rs = region.getScanner(s);RS open the scanner, and ponit the next KV is row2

Time 3,Thread 2, row2 is deleted and row is put,so now in the hbase, the real KV is only row

Time 4,Thread 1 do while(rs.next(r)); because the scanner is pointing row2, however it is deleted now, so rs.next(r) will get nothing even if row is in the hbase.

To fix this issue, we should do scanner.seek in scanner.next() rather than in construction of scanner.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231873#comment-13231873 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Reverted from 0.94 and trunk. Sigh.
A few more details:
* This has definitely something to do with StoreScanner.{checkReseek|resetScannerStack}.
* I *always* see the DEBUG message about the StoreScanner.peek being changed.
* Removing the code for HBASE-5121 does *not* fix this problem.
* This is not related to HBASE-5568.
* The new KV on the heap is always older than the existing one (so the scanner is going backwards in this case)! In this test the client threads assign the timestamps, so one of them might just fall behind.
* The new KV on the head always has memstoreTS=0.
* Either the new or the old KV is a delete marker (but that might be because of the nature of this test).
* Both testRowMutationMultiThreads and testMultiRowMutationMultiThreads have the same problem. So this happens even for Puts and Deletes for the *same* Row, even when they written with the same mvcc writenumber and the in the same WALEdit.

I'll see if I can write a more deterministic test for this.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228665#comment-13228665 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I can try to back out HBASE-5121 and see if I can still get this fail.

I do think my assumption about scanning were wrong, though. HBASE-5229 is still valid (in that it allows a bunch of operations across multiple rows either all fail or all succeed), just that there is currently no way to get a consistent scan over *multiple* rows when flushing is involved (which is OK, because the scanner contract never guaranteed that). If that is the case I should disable the test.

TestAtomicOperation.testRowMutationMultiThreads basically does the same thing only within the same row, I have never seen that one fail.

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "nkeywal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5569:
---------------------------

    Attachment: TestAtomicOperation-output.trunk_120313.rar

testRowMutationMultiThreads logs, on trunk as of today. It failed after 200 iterations.

{noformat}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.007 sec <<< FAILURE!
testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation)  Time elapsed: 8.651 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<0> but was:<8>
	at junit.framework.Assert.fail(Assert.java:50)
	at junit.framework.Assert.failNotEquals(Assert.java:287)
	at junit.framework.Assert.assertEquals(Assert.java:67)
	at junit.framework.Assert.assertEquals(Assert.java:199)
	at junit.framework.Assert.assertEquals(Assert.java:205)
	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)
{noformat}
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Description: 
I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.

---
What I pieced together so far is that it is the *scanning* side that has problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(


  was:
What I pieced together so far is that it is the *scanning* side that has problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(


       Priority: Major  (was: Minor)
        Summary: Do not collect deleted KVs when they are still in use by a scanner.  (was: TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally)
    
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231818#comment-13231818 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

So I suppose this can happen when the two deletes differ only by memstoreTS.
This is a different problem from I fixed in this issue.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231610#comment-13231610 ] 

nkeywal commented on HBASE-5569:
--------------------------------

fwiw, I still have the error on testRowMutationMultiThreads, after a few hundreds iterations... Same logs as above.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Status: Open  (was: Patch Available)
    
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228546#comment-13228546 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I cannot make this test fail locally it seems. Running in a loop for an hour now (test takes ~12s on my machine).
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229542#comment-13229542 ] 

Zhihong Yu commented on HBASE-5569:
-----------------------------------

Currently the test is marked as medium test.
Can we lower the number of threads in the test ?
{code}
    for (int i = 0; i < numThreads; i++) {
{code}
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Attachment: 5569-v4.txt

Same patch with fixes for TestKeyValue and TestCompaction.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232693#comment-13232693 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Thanks N. Good news! Tests pass too. I'm going to wait for some other folks to test on their machines to be extra sure this time.

I need to be extra clear here:
This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS (HBASE-2856 does the same for expired KVs).
Furthermore this is only needed for mixed delete and put operations, although it will generally prevent a flush/compaction from pulling the rug under a scanner.

Personally, I think this is an important fix. However, I want to mention that the alternative is to remove the mutateRows functionality (obviously not my favorite choice), or to document that it only works with KEEP_DELETED_CELLS enabled (also not my favorite outcome).

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232359#comment-13232359 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

This
{code}
if (includeDeleteMarker
    && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
}
{code}
Fixes the issue. Note that maxReadPointToTrackVersions is actually the minimum readpoint of any scanner still operating in the region and it is *only* set during compaction.
I think this correct because of the following:
All delete markers precede the KVs they affect. So by not adding the delete marker it is guarantees that no KVs will be removed during flush that might still be in use. It also removes this race condition between scanner and flushes.

So my previous fix was almost correct (in thought at least). I had believed it to be correct, because I had not been able - not even a single time - to reproduce this on my work machine.
I'll attach a patch soon.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230207#comment-13230207 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Ran all test in TestAtomicOperation more that 3000 times without a failure.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232383#comment-13232383 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

TestCompaction.testMajorCompactingToNoOutput fails because the first scanner in the test was not closed, then the compaction was done. Hence the compaction could not remove the deleted rows, because a scanner is still (potentially) using them.

The test is easily fixed (need to close the first scanner), but we need to think about whether this is the design we want.
This *is* the same behavior we have with HBASE-2856 for expired rows (TTL or too many version): If a scanner is open with an earlier readpoint these will not be collected.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228988#comment-13228988 ] 

chunhui shen commented on HBASE-5569:
-------------------------------------

I think it's may the test case's problem.

Between the time thread1 execute 
{code}region.mutateRowsWithLocks(mrm, rowsToLock);{code}
and
{code}
Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));{code}

another thread2 may execute {code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);{code}, and it will delete row, 
So thread1 may couldn't get any data.

suggestion if uncorrect,thanks.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229016#comment-13229016 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Can't unpack the rar file (guess I need the non-free unrar package, and as a principle I do not install non-free software on my machines).
What I really just need to know is whether there are messages like those in the description right before any assertion failures.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230492#comment-13230492 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-0.94 #32 (See [https://builds.apache.org/job/HBase-0.94/32/])
    HBASE-5569  Do not collect deleted KVs when they are still in use by a scanner. (Revision 1301138)

     Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233986#comment-13233986 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-0.94 #43 (See [https://builds.apache.org/job/HBase-0.94/43/])
    HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303222)

     Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233734#comment-13233734 ] 

stack commented on HBASE-5569:
------------------------------

Your 'being extra clear' note needs to become the release note.

What does this mean?

bq. This patch will prevent any deleted KVs from being collected upon flush or compaction if there is a scanner open with a readpoint smaller than the KV's memstoreTS (HBASE-2856 does the same for expired KVs).

They stay in memstore or in the snapshot or rather, they are attached to the outstanding scanners or rather, we still 'see' them in files or memstores if outstanding scanners and delete is newer than the scanner read point.

Patch looks fine -- makes sense even -- but I'm not up on subtleties that abound in this code.

We didn't output ts in toString KV?  Thats odd.

+1



                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231876#comment-13231876 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

One last point: This seems to be extremely sensitive to the machine it is running on.
Among the various loops I ran on my work machine I ran the test close to 10000 times and have not observed a single failure on that machine (with my changes applied), while on my home machine this is relatively easy to reproduce.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Status: Patch Available  (was: Reopened)

Getting a test run.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229026#comment-13229026 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Comment crossing... We had the same thought. :)
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233896#comment-13233896 ] 

stack commented on HBASE-5569:
------------------------------

Ok.  Thanks for explaination.  +1 on commit.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232765#comment-13232765 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

If some more folks would run the tests in a loop with the patch applied that'd be of great help.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233845#comment-13233845 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

@Stack:
So HBASE-2856 has logic that prevents KVs from being removed in a flush or compaction when it has expired (due to TTL or too many version) but there is still a scanner open with a readpoint <= the KV's memstoreTS. (which means these KVs were created after the scanner was opened)
Say you have set your store set 3 versions. Now you create 10 versions of a KV, the extra 7 versions are not removed during a flush or compaction when a scanner that was opened before the KVs were created.
This patch adds the same for deleted KVs (IMHO that is something that HBASE-2856 missed). So now expired and deleted KVs are not collected if a scanner could still access them.

It means that a flush or compaction needs to copy these KVs to the new store file instead of skipping them. This only happens for KVs that were created (or now deleted) after the scanner(s) were openened.

The output I added is the memstoreTS. The ts is already part toString.

I don't think that needs to be part of the release notes (at least we did not add this to the release notes for HBASE-2856 or its backport).

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231816#comment-13231816 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Interestingly I saw this right before:
{quote}
2012-03-16 19:24:30,523 DEBUG [Thread-46] regionserver.StoreScanner(499): Stores
canner.peek() is changed where before = rowA/colfamily11:qual1/2561/DeleteColumn
/vlen=0,and after = rowA/colfamily11:qual1/2561/DeleteColumn/vlen=0
{quote}
Which makes no sense, because before and after are the same KV.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228554#comment-13228554 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Ok... Failed locally once now as well.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229037#comment-13229037 ] 

chunhui shen commented on HBASE-5569:
-------------------------------------

Since mvcc readpoint guards against scanner see KVs out of its readpoint, why could we see two rows?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233612#comment-13233612 ] 

Zhihong Yu commented on HBASE-5569:
-----------------------------------

+1 from me.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228225#comment-13228225 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I wonder if this has to do with HBASE-5568?
I have multiple threads here that flush the same HRegion directly.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231833#comment-13231833 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

This must be some strange timing issue since it *never* happens on my fast work machine.
I think I'll revert the change until I understand this better.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229676#comment-13229676 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Test is still running in a loop, hasn't failed, yet.
I'll do some more performance tests to make sure this is only slowed down when needed (I need when scans are being performed)
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229023#comment-13229023 ] 

chunhui shen commented on HBASE-5569:
-------------------------------------

Is there any possibility that region.flush break the rule: each scanner should only see KVs according to its mvcc readpoint.
Because in current flush logic, KVs will be deleted when flushing if there is tag of delete type. 

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Description: 
What I pieced together so far is that it is the *scanning* side that has problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(


  was:
What I pieces together so far is that it is the *scanning* side that has problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(


    
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232349#comment-13232349 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Here's my theory...
In ScanQueryMatcher we have this:
{code}
byte type = kv.getType();
if (kv.isDelete()) {
  if (!keepDeletedCells) {
    ...
    this.deletes.add(bytes, offset, qualLength, timestamp, type);
  }
  ...
} else if (!this.deletes.isEmpty()) {
  DeleteResult deleteResult = deletes.isDeleted(bytes, offset, qualLength,
      timestamp);
  ...
}
{code}
And in StoreScanner.resetScannerStack
{code}
// Reset the state of the Query Matcher and set to top row.
// Only reset and call setRow if the row changes; avoids confusing the
// query matcher if scanning intra-row.
...
if ((matcher.row == null) || !kv.matchingRow(matcher.row)) {
  matcher.reset();
  matcher.setRow(kv.getRow());
}
{code}
So, the SQM might already have a delete registered, or might miss a delete.
With KEEP_DELETED_CELLS that race does not happen, because deletes are simply not registered.

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228211#comment-13228211 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

bq. This is the failures we saw up on builds.apache.org? There was a fail in hadoopqa too. You including that?

Yep those are the ones. I recall now that I have occasionally seen these before.

bq. Or is this a bug we've introduced recently?

Possible, but I do not think that is likely.
Maybe the test code is not valid?
Or maybe there is more work to do for multi-row transactions and scanners do not yet see Puts and Deletes atomically across multiple rows...?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230207#comment-13230207 ] 

Lars Hofhansl edited comment on HBASE-5569 at 3/15/12 2:42 PM:
---------------------------------------------------------------

Ran all tests in TestAtomicOperation more that 3000 times without a failure.
                
      was (Author: lhofhansl):
    Ran all test in TestAtomicOperation more that 3000 times without a failure.
                  
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228553#comment-13228553 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Are my assumptions about scanning wrong here?

The test works as follows:
A bunch of thread alternate putting a column on RowA and deleting that column on RowB in a transaction (next time delete is on RowA and put on RowB).
The they each scan starting with RowA and then expect to always see exactly one KV (either the column in RowA or the one in RowB).

So this relies on a scan providing an atomic view over the two rows (which is think should work if both RowA and RowB are rolled forward with the same MVCC writepoint).

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228982#comment-13228982 ] 

chunhui shen commented on HBASE-5569:
-------------------------------------

The check that issues the above DEBUG message added as part of HBASE-5121.
(Storescanner.peek() is changed。。。) will also may print after flush.

This DEBUG message means StoreScanner.peek() is changed after a compaction or a flush or others who calls Store notifyChangedReadersObservers().


                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232386#comment-13232386 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

@N.: Note that the patch also reduces the number of threads for the test[Multi]RowMutationMultipleThreads and increases the rate of flushes per thread.
This made it (far) more likely on my home machine to fail, might be different on your machine.

I should note that on my home machine both test fail every time now on my home machine, but do not with this patch.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234134#comment-13234134 ] 

Hudson commented on HBASE-5569:
-------------------------------

Integrated in HBase-TRUNK-security #144 (See [https://builds.apache.org/job/HBase-TRUNK-security/144/])
    HBASE-5569 Do not collect deleted KVs when they are still in use by a scanner. (Revision 1303220)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229025#comment-13229025 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

But here's a thought. Unless KEEP_DELETED_CELLS is set to true for a store, a flush will unconditionally purge all deleted rows (I put that in that optimization myself :) )... That might be a hole in HBASE-2856, since this was never needed.
HBASE-2856 delay expiration of KVs until all scans are finished, but it does not do this for deleted cells.

I'm trying now with KEEP_DELETED_CELLS enabled to see if I can still reproduce this problem.

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229541#comment-13229541 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Specifically the test deletes a lot of KVs that are still part of a scan and hence cannot be removed, so I think this is ok as far as this test goes.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229044#comment-13229044 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

That is the question ;)
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228592#comment-13228592 ] 

stack commented on HBASE-5569:
------------------------------

hbase-5121 does mess w/ scanners... Seems like pretty issue though, what hbase-5121 is trying to solve.  Pity its so hard verifying this started the failures else we could back it out for now.  Should we back it out anyways and see if we get failures over the next few days?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229539#comment-13229539 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I notice the test takes much longer to complete now. Before the change it was 11s,  now it's about 90s.
That might just be the nature of the test, as it deletes and put a *lot*, and the actual removal of the deleted KVs is delayed (just as it is for expired KVs).

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232380#comment-13232380 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Thanks Stack.
TestKeyValue is a simple fix (because I changed the output of KV.toString()).
TestCompaction looks worrisome, checking it out now.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228236#comment-13228236 ] 

stack commented on HBASE-5569:
------------------------------

Ugh.  Indexing JIRA lost my comment.

Looking at builds, we don't have much of a history on trunk builds but TestAtomicOperation started failing today when "HBASE-5399 Cut the link between the client and the zookeeper ensemble" went in (among others).  I see over in hadoopqa builds that it doesn't fail if I go back twenty odd builds.  It did break here, https://builds.apache.org/view/G-L/view/HBase/job/PreCommit-HBASE-Build/1168/, and on a later build.  Should I try reverting it?
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229021#comment-13229021 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Each scanner should only see KVs according to its mvcc readpoint.
What you describe could also happen with KVs "inside" the same row, and the mvcc readpoint guards against this.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233569#comment-13233569 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

One more +1 anyone?
I think this is an important feature.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569-v4.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229830#comment-13229830 ] 

Hadoop QA commented on HBASE-5569:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518404/5569-v2.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1190//console

This message is automatically generated.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5569:
---------------------------------

    Fix Version/s: 0.96.0
                   0.94.0
    
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231825#comment-13231825 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Added some extra logging. Turns out that the after KV always has memstoreTS=0.
I have to conclude that this is not fixed, yet.
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229035#comment-13229035 ] 

nkeywal commented on HBASE-5569:
--------------------------------

There's no message on Storescanner.peek, nor error or warning. Here's the log when it fails:

{noformat}
2012-03-14 03:14:02,146 DEBUG [Thread-51] regionserver.TestAtomicOperation$1(305): keyvalues=NONE
Exception in thread "Thread-51" 
junit.framework.AssertionFailedError
	at junit.framework.Assert.fail(Assert.java:48)
	at junit.framework.Assert.fail(Assert.java:56)
	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$1.run(TestAtomicOperation.java:307)
2012-03-14 03:14:02,228 DEBUG [Thread-92] regionserver.TestAtomicOperation$1(279): flushing
{noformat}

Reproduced on Feb' 24th trunk as well, after ~700 iterations, same logs.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) Do not collect deleted KVs when they are still in use by a scanner.

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232384#comment-13232384 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

Thanks N.!
                
> Do not collect deleted KVs when they are still in use by a scanner.
> -------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5569-v2.txt, 5569-v3.txt, 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> I noticed this because TestAtomicOperation.testMultiRowMutationMultiThreads fails rarely.
> The solution is similar to HBASE-2856, where expired KVs are not collected when in use by a scanner.
> ---
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228998#comment-13228998 ] 

Lars Hofhansl edited comment on HBASE-5569 at 3/14/12 6:29 AM:
---------------------------------------------------------------

Well... The whole point of the new API was to have atomic operations.
The Put and the Delete are executed atomically together and visible at the same time.
Note that the code alternates putting row and deleting row2, and then putting row2 and deleting row. The scan than ensure that only exactly one column is visible.

In this case the scan *itself* is inconsistent. And worse, as Nicolas (N) found out is that even testRowMutationMultiThreads fails sometimes, and that is just a single row and should never happen.

So I am not entirely convinced the test is at fault.

For example the scenario described above:
if
{code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);
{code}
happened between 
{code}
region.mutateRowsWithLocks(mrm, rowsToLock);
{code}

and
{code}

Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));
{code}

Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC writepoint. So the scan will now see the other row (it sees either row or row, because row -RowA- sorts before row2 -RowB-)
This has nothing to do with race conditions between threads, but only occurs with flushes in the test. I'll remove the forced flushes and then run the test again.

                
      was (Author: lhofhansl):
    Well... The whole point of the new API was to have atomic operations.
The Put and the Delete are executed atomically together and visible at the same time.
Note that the code alternates putting row and deleting row2, and then putting row2 and deleting row. The scan than ensure that only exactly one column is visible.

In this case the scan *itself* is inconsistent. And worse, as Nicolas (N) found out is that even testRowMutationMultiThreads fails sometimes, and that is just a single row and should never happen.

So I am not entirely convinced the test is at fault.

For example the scenario described above if Between the time thread1 execute
if
{code}
Put p = new Put(row2, ts);
                p.add(fam1, qual1, value1);
                mrm.add(p);
                Delete d = new Delete(row);
                d.deleteColumns(fam1, qual1, ts);
                mrm.add(d);
{code}
happened between 
{code}
region.mutateRowsWithLocks(mrm, rowsToLock);
{code}

and
{code}

Scan s = new Scan(row);
RegionScanner rs = region.getScanner(s);
              List<KeyValue> r = new ArrayList<KeyValue>();
              while(rs.next(r));
{code}

Both the Put and the Delete would happen atomically with the same WALEdit and the same MVCC writepoint. So the scan will now see the other row.
This has nothing to do with race conditions between threads, but only occurs with flushes in the test. I'll remove the forced flushes and then run the test again.
                  
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>         Attachments: TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228498#comment-13228498 ] 

Lars Hofhansl commented on HBASE-5569:
--------------------------------------

I'll run this in a loop on my work machine (8 core + hyperthreading), should increase the likelihood of this happening.
Will then avoid the parallel flushing, and see of that fixes the problem.

I think the test always had this problem. On the other I do think this indicates a problem with scanning.
This is suspicious, and the code producing this was also added relatively recently:
{quote}
Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen
{quote}

                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229549#comment-13229549 ] 

Hadoop QA commented on HBASE-5569:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518357/5569.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1188//console

This message is automatically generated.
                
> TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
> -----------------------------------------------------------------------
>
>                 Key: HBASE-5569
>                 URL: https://issues.apache.org/jira/browse/HBASE-5569
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 5569.txt, TestAtomicOperation-output.trunk_120313.rar
>
>
> What I pieced together so far is that it is the *scanning* side that has problems sometimes.
> Every time I see a assertion failure in the log I see this before:
> {quote}
> 2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/75366/Put/vlen=6,and after = rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
> {quote}
> The order of if the Put and Delete is sometimes reversed.
> The test threads should always see exactly one KV, if the "before" was the Put the thread see 0 KVs, if the "before" was the Delete the threads see 2 KVs.
> This debug message comes from StoreScanner to checkReseek. It seems we still some consistency issue with scanning sometimes :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira