You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/10/07 09:10:31 UTC
[jira] Created: (HBASE-1891) Scanner is jumping a full cache of
rows
Scanner is jumping a full cache of rows
---------------------------------------
Key: HBASE-1891
URL: https://issues.apache.org/jira/browse/HBASE-1891
Project: Hadoop HBase
Issue Type: Bug
Components: client
Reporter: Zheng Shao
I ran my program multiple times and this is happening almost all the time.
Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
The HBase code is from:
http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769708#action_12769708 ]
stack commented on HBASE-1891:
------------------------------
You see this still Zheng?
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HBASE-1891:
------------------------------
Attachment: Tester.java
This is the code.
To run the code, compile it into a jar, add the jar to CLASSPATH, and start:
{code}
bin/hbase com.example.hbase.Tester -rows 2000000 -length 100 -autoflush false -writebuffersize 250000 -caching 1000
{code}
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amandeep Khurana updated HBASE-1891:
------------------------------------
Assignee: Amandeep Khurana (was: stack)
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764189#action_12764189 ]
Amandeep Khurana commented on HBASE-1891:
-----------------------------------------
Looking into it. I'll try to replicate it and see whats going on.
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764358#action_12764358 ]
stack commented on HBASE-1891:
------------------------------
I like your little Tester program Zheng.
Help me out. I think I figured how to use it. After loading a table, I then ran it multiple times with following arguments:
{code}
-read true -caching 1000 -create false -write false
{code}
I keep doing it, multiple times, and I get this -- success -- every time it seems.
{code}
09/10/10 09:46:51 DEBUG client.HTable$ClientScanner: Creating scanner over myLittleHBaseTable starting at key ''
09/10/10 09:46:51 DEBUG client.HTable$ClientScanner: Advancing internal scanner to startKey at ''
09/10/10 09:46:51 DEBUG client.HConnectionManager$TableServers: Cache hit for row <> in tableName myLittleHBaseTable: location server 192.168.1.149:61385, location region name myLittleHBaseTable,,1255192797711
09/10/10 09:46:52 INFO Tester: 152001 records/second
09/10/10 09:46:53 INFO Tester: 171502 records/second
09/10/10 09:46:54 INFO Tester: 149498 records/second
09/10/10 09:46:55 INFO Tester: 162000 records/second
09/10/10 09:46:56 INFO Tester: 195338 records/second
09/10/10 09:46:57 INFO Tester: 150662 records/second
09/10/10 09:46:57 DEBUG client.HTable$ClientScanner: Finished with scanning at REGION => {NAME => 'myLittleHBaseTable,,1255192797711', STARTKEY => '', ENDKEY => '', ENCODED => 684003008, TABLE => {{NAME => 'myLittleHBaseTable', FAMILIES => [{NAME => 'myLittleFamily', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
Data verified successfully!
Average speed: 158252 records/second
{code}
I get same whether I run inside in eclipse or out on command line.
I'm on hbase trunk (r823874).
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763843#action_12763843 ]
Zheng Shao commented on HBASE-1891:
-----------------------------------
Thanks for helping out, stack! I really appreciate it.
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: stack
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764187#action_12764187 ]
Zheng Shao commented on HBASE-1891:
-----------------------------------
Thanks Amandeep!
Actually the skip is one full cache less one rows - "I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.".
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764372#action_12764372 ]
stack commented on HBASE-1891:
------------------------------
I tried it on head of 0.20 hbase branch but it doesn't seem to fail (I'm doing a ~200k scans a second on my little mac os x laptop). I'm expecting it to do this if error: " System.err.println("Missing " + expect + " got " + v + " instead!");"... is that right?
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764359#action_12764359 ]
stack commented on HBASE-1891:
------------------------------
I do not see any thing in the diff between my version and yours that would change behavior scanning. Let me check the 0.20 branch of hbase.
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: Amandeep Khurana
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763831#action_12763831 ]
stack commented on HBASE-1891:
------------------------------
This looks bad.... let me check it out....
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: stack
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1891) Scanner is jumping a full cache of
rows
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reassigned HBASE-1891:
----------------------------
Assignee: stack
> Scanner is jumping a full cache of rows
> ---------------------------------------
>
> Key: HBASE-1891
> URL: https://issues.apache.org/jira/browse/HBASE-1891
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Reporter: Zheng Shao
> Assignee: stack
> Attachments: Tester.java
>
>
> I ran my program multiple times and this is happening almost all the time.
> Basically, the ResultScanner is skipping a full cache of rows. I set my caching size to 1000, and when I expect to see row 60000 it gave me 60999.
> My code (will attach here) is creating a new table with a single column family and qualifier, and then write 1 million rows with ascending keys, then immediately read them back to verify.
> The HBase code is from:
> http://svn.apache.org/repos/asf/hadoop/hbase/trunk at
> Exported revision 821973.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.