You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/10/01 14:21:23 UTC
[jira] Updated: (HBASE-1481) Add fast row key only scanning
[ https://issues.apache.org/jira/browse/HBASE-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Gray updated HBASE-1481:
---------------------------------
Attachment: HBASE-1481-v1.patch
Patch adds a new filter called FirstKeyOnlyFilter. It's extremely simple, but this does generally accomplish what we want.
The only further optimizations to row counting I can think of:
- prevent sending back even an entire KV per row (all we really need is the count, but this breaks the API)
- once we work at issues like HBASE-1517, we should seek to the next row after we look at the first KV (if we have a million columns in a row, we don't need to iterate all of them to do a row count)
The latter issue gets me thinking about what filters could do to push that kind of information to the QueryMatcher....
> Add fast row key only scanning
> ------------------------------
>
> Key: HBASE-1481
> URL: https://issues.apache.org/jira/browse/HBASE-1481
> Project: Hadoop HBase
> Issue Type: Improvement
> Affects Versions: 0.19.3
> Reporter: Lars George
> Priority: Minor
> Fix For: 0.21.0
>
> Attachments: HBASE-1481-v1.patch
>
>
> Instead of requiring a user to set up a scanner with any column and scan the table to gather all row keys while ignoring the column value we should have a fast and lightweight scanner that for example takes a "null" for the column list and then simply returns only the matching keys of all non-empty or deleted rows. Filters should still be applicable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.