You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/06/17 23:42:07 UTC

[jira] Created: (HBASE-1537) Intra-row scanning

Intra-row scanning
------------------

                 Key: HBASE-1537
                 URL: https://issues.apache.org/jira/browse/HBASE-1537
             Project: Hadoop HBase
          Issue Type: New Feature
            Reporter: Jonathan Gray
             Fix For: 0.21.0


To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment:     (was: HBASE-1537.patch)

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720941#action_12720941 ] 

Jonathan Gray commented on HBASE-1537:
--------------------------------------

The optimization you mention is a worthwhile one we might look into, in order to reduce RPC payload.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Thanks for the review. Committed. Reopen if insufficient.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767538#action_12767538 ] 

Jean-Daniel Cryans commented on HBASE-1537:
-------------------------------------------

Is it just me or this patch doesn't compile for org.apache.hadoop.hbase.regionserver.transactional.TransactionState?

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment: HBASE-1537.patch

Naive implementation of per next() limits via new InternalScanner method. Aiming for a minimalist approach. Needs testing. Find where/if it doesn't work as expected.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment: HBASE-1537-v2.patch

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment: HBASE-1537-2.patch

Better test.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-2.patch, HBASE-1537-v1.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment:     (was: HBASE-1537-2.patch)

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767544#action_12767544 ] 

Jean-Daniel Cryans commented on HBASE-1537:
-------------------------------------------

Ok sorry false alarm.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720940#action_12720940 ] 

Jonathan Gray commented on HBASE-1537:
--------------------------------------

Result is actually just KeyValue[]... Each KeyValue holds ALL the data, row, family, qualifier, timestamp, value, type.  So we already have all the information we need to put things back in any way we want.

The byte [] row inside Result is actually computed when you ask for it, and it just grabs the row from the first KV, so it's actually already capable from a data structure perspective to hold multiple rows.

Good stuff, Andrew.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765884#action_12765884 ] 

Jonathan Gray commented on HBASE-1537:
--------------------------------------

Patch looks great, Andrew.  I'm probably going to apply this into a 0.20 branch dev cluster soon and play around with some gigantic rows.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Status: Patch Available  (was: Open)

Also versions Scan. 

If Scan had been versioned already, this could have been backported to 0.20 branch.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch, HBASE-1537-v2.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764946#action_12764946 ] 

stack commented on HBASE-1537:
------------------------------

Patch looks good.  Only think is that maybe the test should check that Result does not contain results that span a row?  (We're not supposed to cross rows inside a call to next, right?)

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720934#action_12720934 ] 

Andrew Purtell commented on HBASE-1537:
---------------------------------------

What we did for Stargate scanners is make them iterators over cells and then allow scanners to specify the number of cells they'd like to have come back in one batch. The internal mechanics are more complicated for region servers to do this, but I think similar semantics would be good. How to handle crossing row boundaries presents a couple of options:

- Include row key as well as column and timestamp with each cell value. This is not as expensive as it might sound if a simple string table encoding is used with a marker or two meaning "use last given row key" and "use last given column". Either Thrift or pbufs can handle this by marking row and column keys as optional. 

- Make Result capable of holding more than one row. 

- Return early to the client at row boundary and make it do scanner.next() to start up again on the next row. 

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761685#action_12761685 ] 

Andrew Purtell commented on HBASE-1537:
---------------------------------------

Patch soon with first cut. Will add a bit of state to scanner and change next() semantics on the client such that more than one call to next may be needed to retireve the full row. Next() will not span rows. New next() behavior will be configurable, off by default, toggled by config var or Scan parameter.

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reassigned HBASE-1537:
-------------------------------------

    Assignee: Andrew Purtell

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1537) Intra-row scanning

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1537:
----------------------------------

    Attachment: HBASE-1537-v1.patch

With simple testcase. Seems to work. 

> Intra-row scanning
> ------------------
>
>                 Key: HBASE-1537
>                 URL: https://issues.apache.org/jira/browse/HBASE-1537
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Jonathan Gray
>            Assignee: Andrew Purtell
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1537-v1.patch
>
>
> To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time.  Currently, an entire row must come back as one piece.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.