You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/08/12 23:26:37 UTC

[jira] [Created] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Lars Hofhansl created HBASE-6561:
------------------------------------

Summary: Gets/Puts with many column send the RegionServer into an "endless" loop
Key: HBASE-6561
URL: https://issues.apache.org/jira/browse/HBASE-6561
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.96.0, 0.94.2

This came from the mailing this:
We were able to replicate this behavior in a pseudo-distributed hbase
(hbase-0.94.1) environment. We wrote a test program that creates a test
table "MyTestTable" and populates it with random rows, then it creates a
row with 60,000 columns and repeatedly updates it. Each column has a 18
byte qualifier and a 50 byte value. In our tests, when we ran the
program, we usually never got beyond 15 updates before it would flush
for a really long time. The rows that are being updated are about 4MB
each (minues any hbase metadata).

It doesn't seem like it's caused by GC. I turned on gc logging, and
didn't see any long pauses. This is the gc log during the flush.
http://pastebin.com/vJKKXDx5

This is the regionserver log with debug on during the same flush
http://pastebin.com/Fh5213mg

This is the test program we wrote.
http://pastebin.com/aZ0k5tx2

You should be able to just compile it, and run it against a running
HBase cluster.
$ java TestTable

Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448320#comment-13448320 ] 

Hudson commented on HBASE-6561:
-------------------------------

Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/])
    HBASE-6561 Gets/Puts with many columns send the RegionServer into an 'endless' loop (Revision 1373951)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439498#comment-13439498 ] 

Hudson commented on HBASE-6561:
-------------------------------

Integrated in HBase-0.94-security #48 (See [https://builds.apache.org/job/HBase-0.94-security/48/])
    HBASE-6561 Gets/Puts with many columns send the RegionServer into an 'endless' loop (Revision 1373951)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434044#comment-13434044 ] 

nkeywal commented on HBASE-6561:
--------------------------------

If I understand well the problem you're addressing, we may have a reseek asking to go before the point we have already reached by the next()?

In your implementation, do you have to keep the row pointed by the iterator? You could use the existing iterator with next(), as the iterators will be recreated at the end of the reseek?


                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434815#comment-13434815 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

The new patch fails these tests:
Failed tests:   testFlushCacheWhileScanning(org.apache.hadoop.hbase.regionserver.TestHRegion): i=20 expected:<2> but was:<1>
  testWritesWhileGetting(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:<0> but was:<2>

I think relying on the iterators is too brittle. I'll go back to the approach of explicitly saving the iterator state.

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Status: Patch Available  (was: Open)

Let's try hadoop qa again.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434280#comment-13434280 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

You cannot get the current state of an iterator without advancing it, that's why I think I need to hold on to the current "state" of the iterator.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432862#comment-13432862 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Looking at the code again, I am actually not sure anymore why building tailSet in MemStoreScanner and then calling next on the iterators would be scanning through so many more KVs with newer memstoreTSs. There might be something else at play here.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96-v4.txt

Attaching patch again
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433269#comment-13433269 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

That would already be the case if we just iterated with next().
I think the reasoning is that no new KVs with a older memstoreTS can possible be inserted later, and a scan is not guaranteed to pick up KVs that were inserted later anyway, so it should be OK.
I'll make a patch with that soon.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434897#comment-13434897 ] 

Hadoop QA commented on HBASE-6561:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541008/6561-0.96-v5.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2580//console

This message is automatically generated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436434#comment-13436434 ] 

Hudson commented on HBASE-6561:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #132 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/132/])
    HBASE-6561 Gets/Puts with many columns send the RegionServer into an 'endless' loop (Revision 1373943)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Compactor.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434013#comment-13434013 ] 

Hadoop QA commented on HBASE-6561:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540804/6561-0.96-v3.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2565//console

This message is automatically generated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432841#comment-13432841 ] 

Lars Hofhansl edited comment on HBASE-6561 at 8/13/12 8:37 AM:
---------------------------------------------------------------

When I instrument the code slightly further I see that MemStoreScanner.getNext(...) has to skip over a large number of KVs with a higher memstoreTS (many millions), which causes the complete unresponsiveness of the RS (I tried to let the RS run for a while after I stopped the Puts/Gets from the client, but after 30mins I just killed it).

As a test I changed ScanWildcardColumnTracker.checkVersion to return MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number of versions is reached); and I do not see this behavior.

It seems the problem here stems from the excessive reseeking in this case. The StoreScanner is smart and instructs its KeyValueHeap to seek to the next column (which can potentially skip many columns). Only in this case, this is in fact not so smart, because reseek needs to reset the current iterators, and hence all KV with higher memstoreTS have to be searched again (see also comment in MemStoreScanner.reseek).

There's is no way (that I found) to isolate this issue before it happens.
I have a patch where in reseek the MemStoreScanner opportunistically iterates "a bit" (1000 iterations in my patch), and only then actually reseeks forward.
This eliminates the problem for me.
I have no good numbers of when it is more efficient to seek ahead (and skip many versions, columns, or rows) (i.e. how many iterations the MemStoreScanner should attempt before seeking).

Edit: Spelling.
                
      was (Author: lhofhansl):
    When I instrument the code slight further I see that MemStoreScanner.getNext(...) has to skip over a large number of KVs with a higher memstoreTS (many millions), which causes the complete unresponsiveness of the RS (I tried that let the RS running for a while after I stopped the Puts/Gets from the client, but after 30mins I just killed it).

As a test I changed ScanWildcardColumnTracker.checkVersion to return MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number of versions is reached); and I do not see this behavior.

It seems the problem here stems from the excessive reseeking in this case. The StoreScanner is smart is smart and instructs its KeyValueHeap to seek to the next column (which can potentially skip many columns). Only in this case, this is in fact not so smart, because reseek need to reset the current iterators, and hence all KV with higher memstoreTS have to be searched again (see also comment in MemStoreScanner.reseek).

There's is no way (that I found) to isolate this issue before it happens.
I have a patch where in reseek the MemStoreScanner opportunistically iterates "a bit" (1000 iterations in my patch), and only then actually reseeks forward.
This eliminates the problem for me.
I have no good numbers of when it is more efficient to seek ahead (and skip many versions, columns, or rows) (i.e. how many iterations the MemStoreScanner should attempt before seeking).

                  
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433702#comment-13433702 ] 

Hadoop QA commented on HBASE-6561:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540754/6561-0.96-v2.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2558//console

This message is automatically generated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433221#comment-13433221 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Another option is to have the MemStoreScanner memorize the last key it iterated to (for both the kvSet and tailSet) and use that to build tailSet first in reseek (i.e. the closest we can get to save the iterator state).
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96.txt

And a trunk patch.
Please take a careful look. I'm also very open for better suggestions.

In the long run, we should probably use a different data structure for the MemStoreScanner.
                
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433682#comment-13433682 ] 

Zhihong Ted Yu commented on HBASE-6561:
---------------------------------------

So with latest patch region server becomes responsive, I assume.
{code}
+    // last iterated KVs for kvser and snapshot (to restore iterator state after reseek)
{code}
'kvser' -> 'kvset'
{code}
+    protected KeyValue getHighest(KeyValue first, KeyValue second) {
{code}
The above method can be private, right ?
Since only two keyvalues are involved, maybe call it getHigher() ?
{code}
+      int compactionKVMax = conf.getInt("hbase.hstore.compaction.kv.max", 10);
{code}
The above constant appears in Compactor.java
Consider introducing a constant and reference the constant.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434376#comment-13434376 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Oh, I see what you are saying. Because we're seeking forward it is OK to go to the immediate next key of both iterators because it'll never be smaller than what we are seeking to? I could buy that if we never reseek to the current top key.

Also need an extra check then for whether the iterator is exhausted and ignore a call to reseek then.

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435652#comment-13435652 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Going to commit tomorrow.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432954#comment-13432954 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

An idea would be to have a special comparator that would always sort by memstoreTS first (reverse) and use that to sort the KeyValueSkipListSet. That all new kvs can be skipped cheaply.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Summary: Gets/Puts with many columns send the RegionServer into an "endless" loop  (was: Gets/Puts with many column send the RegionServer into an "endless" loop)
    
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436221#comment-13436221 ] 

Hudson commented on HBASE-6561:
-------------------------------

Integrated in HBase-0.94 #400 (See [https://builds.apache.org/job/HBase-0.94/400/])
    HBASE-6561 Gets/Puts with many columns send the RegionServer into an 'endless' loop (Revision 1373951)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433763#comment-13433763 ] 

Zhihong Ted Yu commented on HBASE-6561:
---------------------------------------

The fact that patch v2 passed unit test suite gave me confidence in the approach.

+1 from me.

Of course, review from Mikhail, et. al would be appreciated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Status: Patch Available  (was: Open)
    
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433203#comment-13433203 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

bq. So, we seek to column X+1, we need to skip all versions (because they have readpoint > 1) – so we keep reseeking until the end. That'd be crazy for sure.

That's exactly what I am seeing.

bq. Why does the Put throw off the Get? Because memStoreTS gets updated during the Get?

Somehow there's an older Get that did not finish (or is retried?), and the Puts keep piling on versions that need to be skipped.

BTW. The idea with sorting leading with memstoreTS does not work, because for flush we need them in normal key order.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96-v4.txt

New patch. Uses the iterators directly.
Also does away with kvTail and snapshotTail.
(To N.'s question: Building the tail of a tail repeatedly with KeyValueSkiplistSet, does not really help, in fact it just makes it worse).

The stalling the RS is still gone. Let's see what HadoopQA says.
Again, please have a careful, this stuff is tricky.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433728#comment-13433728 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Thanks Ted. getHighest was really copied from getLowest :) Open for best suggestions to combine the two without bad readability. I can make both of them private.
Will fix the spelling, and introduce a constant for the setting.

Do you see any logical problem with the patch. It should work because is scanner is forward only unless seek() is used. I would still consider this a somewhat risky change, though.
I'd like to get some comments on this new logic (and some +1's :) )
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96-v2.txt

Here's a better patch. A MemStoreScanner now remembers the last KV that was ever iterated to and makes sure we never go backwards from that in a reseek (reseeks must be going forward).
The other observation is that taking continuous tailSets of tailSets will make things worse (from looking the code... the amount of work is the same, but through more layers of indirection).

As before, please have a careful look, this is a pretty core part of HBase.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.94.txt

Here's a 0.94 (that where the problem was reported) patch that does what I describe above. Can't say I like it.
                
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96-v5.txt

Almost the same as v3 (not v4). Also does away with kvTail snapshotTail, as they are not needed.

This should be close to final patch.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432941#comment-13432941 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

A little more detail. When this happens I see the following pattern:

# a seek to column X (at readpoint 1)
# all versions of *all* columns > X only have a readpoint > 1, hence all need to be skipped
# goto 1. with column X+1, still at readpoint 1, until we exhausted all columns

So for each KVs we reseek to, we skip over all KVs larger than this KV. This leads to many millions (hundreds of millions) of KVs that are needlessly skipped multiple times.
MemStoreScanner.getNext() simply does not find a single KV with the right readpoint and iterates all the way to end (and does so again for each reseek).

This is very pathological scenario. Somehow a previous Get is not finished before the next Put inserts (see the sample code in pastebin in the description), which seems impossible.

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433226#comment-13433226 ] 

stack commented on HBASE-6561:
------------------------------

bq. Another option is to have the MemStoreScanner memorize the last key it iterated to (for both the kvSet and tailSet) and use that to build tailSet first in reseek (i.e. the closest we can get to save the iterator state).

That could work.  I was worried something could be inserted between our get of last key and build of tailset but should be fine since the read point will be lagging the write point used by the insertion?
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Status: Open  (was: Patch Available)
    
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434819#comment-13434819 ] 

Hadoop QA commented on HBASE-6561:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541000/6561-0.96-v4.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestHRegion
                  org.apache.hadoop.hbase.regionserver.TestAtomicOperation
                  org.apache.hadoop.hbase.TestAcidGuarantees

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2576//console

This message is automatically generated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432861#comment-13432861 ] 

Hadoop QA commented on HBASE-6561:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540593/6561-0.96.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2550//console

This message is automatically generated.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432841#comment-13432841 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

When I instrument the code slight further I see that MemStoreScanner.getNext(...) has to skip over a large number of KVs with a higher memstoreTS (many millions), which causes the complete unresponsiveness of the RS (I tried that let the RS running for a while after I stopped the Puts/Gets from the client, but after 30mins I just killed it).

As a test I changed ScanWildcardColumnTracker.checkVersion to return MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number of versions is reached); and I do not see this behavior.

It seems the problem here stems from the excessive reseeking in this case. The StoreScanner is smart is smart and instructs its KeyValueHeap to seek to the next column (which can potentially skip many columns). Only in this case, this is in fact not so smart, because reseek need to reset the current iterators, and hence all KV with higher memstoreTS have to be searched again (see also comment in MemStoreScanner.reseek).

There's is no way (that I found) to isolate this issue before it happens.
I have a patch where in reseek the MemStoreScanner opportunistically iterates "a bit" (1000 iterations in my patch), and only then actually reseeks forward.
This eliminates the problem for me.
I have no good numbers of when it is more efficient to seek ahead (and skip many versions, columns, or rows) (i.e. how many iterations the MemStoreScanner should attempt before seeking).

                
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6561:
---------------------------------

    Attachment: 6561-0.96-v3.txt

Patch addressing Ted's comments.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many column send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432839#comment-13432839 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

Found two things:
1. Store.internalFlushCache(...) should be calling StoreScanner.next(List<KeyValue>, int limit) - currently it does not set a limit.(But this is not the problem).

2. With jstack I found that the code is stuck in a loop in Memstore.MemstoreScanner.getNext(...)
{code}
Here's the relevant part of the jstack:
"IPC Server handler 6 on 60020" daemon prio=10 tid=0x00007f0574625000 nid=0x720c runnable [0x00007f05669e7000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:726)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLists(MemStore.java:761)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522)
        - eliminated <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:403)
        - locked <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3459)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1958)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)
{code}



At the same time I find that flush cannot finish:
{code}

"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007f05749ab000 nid=0x71fe waiting for monitor entry [0x00007f05677f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:443)
        - waiting to lock <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(Store.java:904)
        at org.apache.hadoop.hbase.regionserver.Store.updateStorefiles(Store.java:893)
        at org.apache.hadoop.hbase.regionserver.Store.access$600(Store.java:107)
        at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2291)
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1455)
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1353)
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1294)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
        at java.lang.Thread.run(Thread.java:722)
{code}


Both StoreScanner.updateReaders and StoreScanner.reseek are synchronized.


So the problem seems to be that MemStoreScanner loops forever in getNext(...). I took a jstack a bunch of times during execution, this always shows up.
Need to dig a bit more, I do not see a good way to deal with this, yet.

                
> Gets/Puts with many column send the RegionServer into an "endless" loop
> -----------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434645#comment-13434645 ] 

Lars Hofhansl commented on HBASE-6561:
--------------------------------------

@N: It would not work if we repeatedly reseek to the *same* key. The iterator would always be one key ahead (hence the highest of the two would always be the next key). Reseek should always be forward, though. So it should work.
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434875#comment-13434875 ] 

nkeywal commented on HBASE-6561:
--------------------------------

+1 for me on v5, thanks Lars for the explanation and the attempt with the iterators!
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434501#comment-13434501 ] 

nkeywal commented on HBASE-6561:
--------------------------------

bq. I could buy that if we never reseek to the current top key.
Hum. I think it would work, as the iterator would be reset to the current top key again.

bq. Also need an extra check then for whether the iterator is exhausted and ignore a call to reseek then.
Agreed, but if we are at the end, may be it has a specific meaning as well (i.e. we're done), so it's not that bad to manage it explicitly?

On the same line, why have you chosen
      kvTail = kvsetAtCreation.tailSet(getHighest(key, kvsetItRow));
      snapshotTail = snapshotAtCreation.tailSet(getHighest(key, snapshotItRow));
vs.
      kvTail = kvTail.tailSet(getHighest(key, kvsetItRow));
      snapshotTail = snapshotTail.tailSet(getHighest(key, snapshotItRow));

?

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436215#comment-13436215 ] 

Hudson commented on HBASE-6561:
-------------------------------

Integrated in HBase-TRUNK #3226 (See [https://builds.apache.org/job/HBase-TRUNK/3226/])
    HBASE-6561 Gets/Puts with many columns send the RegionServer into an 'endless' loop (Revision 1373943)

     Result = FAILURE
larsh : 
Files : 
* /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Compactor.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432976#comment-13432976 ] 

stack commented on HBASE-6561:
------------------------------

bq. all versions of all columns > X only have a readpoint > 1, hence all need to be skipped

I don't follow the above Lars.... So, we seek to column X+1, we need to skip all versions (because they have readpoint > 1) -- so we keep reseeking until the end.  That'd be crazy for sure.

Why does the Put throw off the Get?  Because memStoreTS gets updated during the Get?

(No need to answer... you are probably way beyond the above by now).
                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an "endless" loop

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434305#comment-13434305 ] 

nkeywal commented on HBASE-6561:
--------------------------------

Yes, but you will be resetting the iterators in seekInSubLists, called at the end of reseek.


                
> Gets/Puts with many columns send the RegionServer into an "endless" loop
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6561
>                 URL: https://issues.apache.org/jira/browse/HBASE-6561
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt
>
>
> This came from the mailing this:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
> Carlos

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira