You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/10/01 14:21:23 UTC

[jira] Updated: (HBASE-1481) Add fast row key only scanning

     [ https://issues.apache.org/jira/browse/HBASE-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1481:
---------------------------------

    Attachment: HBASE-1481-v1.patch

Patch adds a new filter called FirstKeyOnlyFilter.  It's extremely simple, but this does generally accomplish what we want.

The only further optimizations to row counting I can think of:

- prevent sending back even an entire KV per row (all we really need is the count, but this breaks the API)
- once we work at issues like HBASE-1517, we should seek to the next row after we look at the first KV (if we have a million columns in a row, we don't need to iterate all of them to do a row count)

The latter issue gets me thinking about what filters could do to push that kind of information to the QueryMatcher....

> Add fast row key only scanning
> ------------------------------
>
>                 Key: HBASE-1481
>                 URL: https://issues.apache.org/jira/browse/HBASE-1481
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.3
>            Reporter: Lars George
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1481-v1.patch
>
>
> Instead of requiring a user to set up a scanner with any column and scan the table to gather all row keys while ignoring the column value we should have a fast and lightweight scanner that for example takes a "null" for the column list and then simply returns only the matching keys of all non-empty or deleted rows. Filters should still be applicable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.