You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2008/07/15 23:39:31 UTC

[jira] Created: (HBASE-747) Add a simple way to do batch updates of many rows

Add a simple way to do batch updates of many rows
-------------------------------------------------

                 Key: HBASE-747
                 URL: https://issues.apache.org/jira/browse/HBASE-747
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: client
    Affects Versions: 0.2.0
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
             Fix For: 0.2.0


Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613804#action_12613804 ] 

Jim Kellerman commented on HBASE-747:
-------------------------------------

Taking out a read lock to prevent splits seems reasonable provided that the batch is not huge. Suppose someone submitted 1GB in a single patch? It would take quite a while for HBase to recover from that.

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-747:
--------------------------------

    Fix Version/s:     (was: 0.2.0)
                   0.3.0
                   0.2.1

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-747:
--------------------------------

    Fix Version/s:     (was: 0.2.1)
                       (was: 0.3.0)
                   0.2.0

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.0
>
>         Attachments: hbase-747-simple-v2.patch, hbase-747-simple.patch
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613811#action_12613811 ] 

Jean-Daniel Cryans commented on HBASE-747:
------------------------------------------

Well I was thinking about only implementing the simple version e.g. wrapping a bunch of row mutations in a single update. I consider two different options on how the client could interact with region servers:

 - Simplest way, it does the equivalent of doing a series of BatchUpdate in an iteration . There would be no gain using the API directly, those using Thrift would have far less RPCs to do. The code would be something like :

{code}
  public synchronized void commit(final BatchUpdate batchUpdate) 
  throws IOException {
    for(BatchRowUpdate rowUpdate : batchUpdate.getRowUpdates()) {
      connection.getRegionServerWithRetries(
        new ServerCallable<Boolean>(connection, tableName, rowUpdate.getRow()) {
        ...
  }
{code}

 - A better version would group rows that goes in the same server so that it reduces RPCs between the client and the region servers.

I think the first one would be doable for 0.2.0 (which is why I put there in the first place).

Jim, if there's a 1GB update, I guess the client will have first to handle a massive OOME ;)  But yeah, this solution does not handle big updates(the javadoc would have to reflect this), the more badass one I described in HBASE-48 would.

Finally, I think we should put such a facility in 0.2.0 since this release is supposed to offer a stable API. Then we can implement something more efficient for 0.3.0 or in a 0.2 minor revision.

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614103#action_12614103 ] 

Jim Kellerman commented on HBASE-747:
-------------------------------------

+1


> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: hbase-747-simple-v2.patch, hbase-747-simple.patch
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-747:
-------------------------------------

    Attachment: hbase-747-simple-v2.patch

Version without the RowsBatchUpdate

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: hbase-747-simple-v2.patch, hbase-747-simple.patch
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-747.
---------------------------------

    Resolution: Fixed

Committed. Thanks for the patch Jean-Daniel

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.0
>
>         Attachments: hbase-747-simple-v2.patch, hbase-747-simple.patch
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613798#action_12613798 ] 

stack commented on HBASE-747:
-----------------------------

I was thinking that a batchupdate would take out a read lock preventing splits?

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613797#action_12613797 ] 

Jim Kellerman commented on HBASE-747:
-------------------------------------

Somehow the client needs to sort row updates so that whatever batch is sent to the region server, the region server is hosting all those rows.  Any it is not hosting (because they are in a different region and there is a different region server for that region), will error out so the client would be responsible for partitioning updates to the correct servers.

Question: Suppose that in the middle of one of these massive updates, the region splits. It is unlikely that the same region server will serve either or both of the children. How will the client handle that?

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.0
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-747) Add a simple way to do batch updates of many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-747:
-------------------------------------

    Attachment: hbase-747-simple.patch

Patch implemented using the simplest description. It doesn't break the 0.2 API and allows further improvements to be hidden.

> Add a simple way to do batch updates of many rows
> -------------------------------------------------
>
>                 Key: HBASE-747
>                 URL: https://issues.apache.org/jira/browse/HBASE-747
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: hbase-747-simple.patch
>
>
> Add a simple to do batch updates of many rows as described in HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.