You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/10/07 09:10:31 UTC

[jira] Created: (HBASE-1891) Scanner is jumping a full cache of rows

Scanner is jumping a full cache of rows
---------------------------------------

                 Key: HBASE-1891
                 URL: https://issues.apache.org/jira/browse/HBASE-1891
             Project: Hadoop HBase
          Issue Type: Bug
          Components: client
            Reporter: Zheng Shao


I ran my program multiple times and this is happening almost all the time.

Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.

My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.


The HBase code is from:
http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
Exported revision 821973.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769708#action_12769708 ] 

stack commented on HBASE-1891:
------------------------------

You see this still Zheng?

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HBASE-1891:
------------------------------

    Attachment: Tester.java

This is the code.

To run the code, compile it into a jar, add the jar to CLASSPATH, and start:

{code}
bin/hbase com.example.hbase.Tester -rows 2000000 -length 100 -autoflush false -writebuffersize 250000 -caching 1000
{code}


> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amandeep Khurana updated HBASE-1891:
------------------------------------

    Assignee: Amandeep Khurana  (was: stack)

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764189#action_12764189 ] 

Amandeep Khurana commented on HBASE-1891:
-----------------------------------------

Looking into it. I'll try to replicate it and see whats going on.

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764358#action_12764358 ] 

stack commented on HBASE-1891:
------------------------------

I like your little Tester program Zheng.

Help me out.   I think I figured how to use it.  After loading a table, I then ran it multiple times with following arguments:

{code}
-read true -caching 1000 -create false -write false
{code}

I keep doing it, multiple times, and I get this -- success -- every time it seems.

{code}
09/10/10 09:46:51 DEBUG client.HTable$ClientScanner: Creating scanner over myLittleHBaseTable starting at key ''
09/10/10 09:46:51 DEBUG client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
09/10/10 09:46:51 DEBUG client.HConnectionManager$TableServers: Cache hit for row <> in tableName myLittleHBaseTable: location server 192.168.1.149:61385, location region name myLittleHBaseTable,,1255192797711
09/10/10 09:46:52 INFO Tester: 152001 records/second
09/10/10 09:46:53 INFO Tester: 171502 records/second
09/10/10 09:46:54 INFO Tester: 149498 records/second
09/10/10 09:46:55 INFO Tester: 162000 records/second
09/10/10 09:46:56 INFO Tester: 195338 records/second
09/10/10 09:46:57 INFO Tester: 150662 records/second
09/10/10 09:46:57 DEBUG client.HTable$ClientScanner: Finished with scanning at REGION => {NAME => 'myLittleHBaseTable,,1255192797711', STARTKEY => '', ENDKEY => '', ENCODED => 684003008, TABLE => {{NAME => 'myLittleHBaseTable', FAMILIES => [{NAME => 'myLittleFamily', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
Data verified successfully!
Average speed: 158252 records/second
{code}

I get same whether I run inside in eclipse or out on command line.

I'm on hbase trunk (r823874).

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763843#action_12763843 ] 

Zheng Shao commented on HBASE-1891:
-----------------------------------

Thanks for helping out, stack! I really appreciate it.


> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: stack
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764187#action_12764187 ] 

Zheng Shao commented on HBASE-1891:
-----------------------------------

Thanks Amandeep!
Actually the skip is one full cache less one rows - "I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.".


> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764372#action_12764372 ] 

stack commented on HBASE-1891:
------------------------------

I tried it on head of 0.20 hbase branch but it doesn't seem to fail (I'm doing a ~200k scans a second on my little mac os x laptop).  I'm expecting it to do this if error: "              System.err.println("Missing " + expect + " got " + v + " instead!");"... is that right?

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764359#action_12764359 ] 

stack commented on HBASE-1891:
------------------------------

I do not see any thing in the diff between my version and yours that would change behavior scanning.  Let me check the 0.20 branch of hbase.

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: Amandeep Khurana
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763831#action_12763831 ] 

stack commented on HBASE-1891:
------------------------------

This looks bad.... let me check it out....

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: stack
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1891) Scanner is jumping a full cache of rows

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-1891:
----------------------------

    Assignee: stack

> Scanner is jumping a full cache of rows
> ---------------------------------------
>
>                 Key: HBASE-1891
>                 URL: https://issues.apache.org/jira/browse/HBASE-1891
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>            Reporter: Zheng Shao
>            Assignee: stack
>         Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.