You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2008/07/16 23:33:31 UTC

[jira] Created: (HBASE-748) Add an efficient way to batch update many rows

Add an efficient way to batch update many rows
----------------------------------------------

                 Key: HBASE-748
                 URL: https://issues.apache.org/jira/browse/HBASE-748
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: client
    Affects Versions: 0.1.3, 0.2.0
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
             Fix For: 0.3.0


HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-748:
-------------------------------------

    Attachment: hbase-748-v5.patch

This patch has a problem with HBASE-919

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch, hbase-748-v4.patch, hbase-748-v5.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629637#action_12629637 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

If we are to batch any row operation, I think we will need to create a new class for gets. I don't think a signature like that is what we want:

{code}
public RowResult getRow(final byte [][] row, final byte [][][] columns, final long[] ts, final RowLock[] rl) 
{code}

In fact, a class that looks like BatchUpdate would be what we need. Thinking of it, we could drop many of the ever expanding client API methods using such a class. I would also put the row lock inside that class.

Finally, it would be used by the row-batching logic (which is worst to design than I thought it was at first) as a generic row operation.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636119#action_12636119 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

bq. No javadoc for HConnection.getRegionServerForManyRows

On that particular part I still need to do some work so I didn't freeze it.

bq. Do HTable.flushCommits and HTable$ClientScanner.initialize need to be public? Can't they just be protected?

We need to be able to flushCommits if auto-commit is off. Regards HTable$ClientScanner.initialize, well it's not this in my patch but from what I see TransactionalTable needs it so it cannot be protected nor private.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620463#action_12620463 ] 

stack commented on HBASE-748:
-----------------------------

Do you think the HTable client should do the sorting and organizing of edits into batches or should that be done by the calling application?  Hypertable would seem to do the former.  Reading Hypertable user list, it looks like they have a mechanism for buffering up edits in the client.  When the client update buffer is full, it flushes the edits sending in batches with each batch going to the appropriate rangeserver.  There is also an explicit flush which you can call to send the current set of edits.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.3.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-748.
--------------------------------------

    Resolution: Fixed

Committed to trunk without the fix. PE on my machine scores 128 sec. Thanks to everyone that helped in making this sweet optimization!

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch, hbase-748-v4.patch, hbase-748-v5.patch, hbase-748-v6.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634884#action_12634884 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

{quote}
Should the RowLock be associated with the BatchUpdate rather than being supplied on commit? That would allow us to remove one commit overload, and allow the client to associate the row lock with multiple BatchUpdates for the same row.
{quote}

This is in the scope of HBASE-880. Glad to see that I'm not the only one who saw that problem.

bq. +1 on moving checks into commit (or flushCommits).

That would be in commit since in flush it's too late IMO.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634658#action_12634658 ] 

Jim Kellerman commented on HBASE-748:
-------------------------------------

Jean-Daniel Cryans - 24/Sep/08 03:59 PM
{quote}
HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23 is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.
{quote}

Ok, now I get it. I missed that part. Sorry for being dense.

{quote}
This actually works really well but it's not atomic if a row fails, for example, if a value was too long.
{quote}

Well, aside from the transactional region server, I would not expect it to be atomic across rows.
Were you thinking that there may be multiple BatchUpdates for the same row? Not the best way for a client to behave in my opinion.

A couple of comments though.
- HTable.flushCommits() seems to ignore the row lock that can be passed to HTable.commit(BatchUpdate, RowLock)
- Should the RowLock be associated with the BatchUpdate rather than being supplied on commit? That would allow us to remove one commit overload, and allow the client to associate the row lock with multiple BatchUpdates for the same row.

+1 on moving checks into commit (or flushCommits). We still fail early, although not as early as we would if the checks were done in BatchUpdate. But as Stack points out, having BatchUpdate require a HTable or HTD would be ugly. At least the request won't be partially processed before failing.

Last comment on patch. Remove code that is commented out in HTable.commit(BatchUpdate, RowLock)


> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-748:
------------------------

    Attachment: hbase-748-v4.patch

Problem was deserializing the BatchUpdate, we'd use the no-arg BatchUpdate constructor.  The no-arg BatchUpdate waterfalls down to the constructor that takes all arguments.  It did this:

{code}
this.size = row.length;
{code}

If no-arg constructor, row was null.  The NPE wasn't making it out during deserialization.

This attached patch includes fix and some little cleanup.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch, hbase-748-v4.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-748:
-------------------------------------

    Attachment: hbase-748-v6.patch

This patch passes all tests and adds some. Generating javadoc gives no warning. Please review.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch, hbase-748-v4.patch, hbase-748-v5.patch, hbase-748-v6.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-748:
--------------------------------

    Fix Version/s:     (was: 0.18.0)
                   0.19.0

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-748:
-------------------------------------

    Attachment: hbase-748-v1.patch

Current status of this issue:

1- The way retries are handled client-side is ugly, cannot be reused for other operations (so HBASE-880 wouldn't fit in).
2- There is duplicated code in the two HTable.commit and HRS.batchUpdate.
3- WRE and NSRE maybe should share a common super class.
4- If any exception is thrown server-side, like values are too long or columns in the wrong format, the state of the whole transaction will be unknown to the user.

But it does give a 2x boost when autoFlush is disabled.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634344#action_12634344 ] 

stack commented on HBASE-748:
-----------------------------

On the design outlined in 'Jean-Daniel Cryans - 17/Sep/08 06:24 AM', +1.

On the outline in 'Jean-Daniel Cryans - 19/Sep/08 08:30 AM', would suggest that the index be thrown for any exception, not just WRE (though, yes WRE is probably only Exception that would make use of it).  On the sort of RowOperation, does RO implement Comparator?  Should it?  Otherwise +1 on pseudo-code.

Regards 'Jean-Daniel Cryans - 19/Sep/08 02:19 PM', whats the data saying?  That flushing at 1MB is better than flushing at 64MB?  Whats auto-flush set to or is this the number when auto-flushing is disabled?





> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640681#action_12640681 ] 

stack commented on HBASE-748:
-----------------------------

+1 on patch. Just remove the HBASE-919 workaround 5 second pause before committing.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch, hbase-748-v4.patch, hbase-748-v5.patch, hbase-748-v6.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631769#action_12631769 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

I gave more thought to st^ack's idea of buffering the edits and I think it would be nice to implement it. This is how I see it.

We keep an ArrayList of RowUpdates in HTable so that we have a cache per table. It should be of a configurable maximum size in bytes. Maybe a default of 64M? It should also be configurable when creating a HTable.

The RowUpdate class should be able to give us the size of all the BatchOperation it contains. It should fairly easy to do by asking each BO their value's length.

We can compute the size of the RowUpdate either at commit time or we can do it after each put. I would prefer after each put so we skip the iteration.

In the case of auto-flushing, I see two ways to detect that the buffer is full. Either at commit time or in a separate thread like the Flusher currently works. The first is very easy to implement but blocks the commits. The second is harder to implement but doesn't block the commits. I think that for 0.19.0 we could implement the first one. 

The other case is that auto-flushing is disabled and then it is the user's responsibility to call something like HTable.flushEdits().

Any comments?

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632826#action_12632826 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

Some tests a did with a dirty patch:

{code}
auto-flush  Finished sequentialWrite in 304015ms at offset 0 for 1048576 rows
1MB  buffer Finished sequentialWrite in 152825ms at offset 0 for 1048576 rows
16MB buffer Finished sequentialWrite in 151969ms at offset 0 for 1048576 rows
32MB buffer Finished sequentialWrite in 171990ms at offset 0 for 1048576 rows
64MB buffer Finished sequentialWrite in 209194ms at offset 0 for 1048576 rows
{code}

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634348#action_12634348 ] 

stack commented on HBASE-748:
-----------------------------

On the patch:

I see you've made BatchUpdate comparable already.  Scratch my suggestion above.  I like the fact that you can ask a BatchUpdate its size.  Should size include row and column sizes too to cover the case where they are really large?

bq. The region gets split after 10 rows... (From 'Jean-Daniel Cryans - 24/Sep/08 03:59 PM - edited')

I'd been thinking that doing batch updates, we'd take out the splitsAndClosesLock so splits wouldn't happen.  Would this be easier if you moved the array of batch update handlings down into HRegion rather than run it in HRS?

Minor nitpick: Put declaration and this line -- '+    this.writeBuffer = new ArrayList<BatchUpdate>();' -- together and then add final keywork (not important... but if you are revising anyways).  Similarily for this line -- +    this.currentWriteBufferSize = 0;... why not declare and do initial assign in the one line rather than have it split with assignment down in constructor?

More nitpicks:

{code}
+      if (bu.getRow() == null)
+        throw new IllegalArgumentException("update has null row");
{code}

In hadoop coding, folks supply the parens as in:

{code}
+      if (bu.getRow() == null) {
+        throw new IllegalArgumentException("update has null row");
+     }
{code}

Just FYI.

Here's a style comment:  Rather than do this in fflushCommits:

{code}
+    if (!writeBuffer.isEmpty()) {
{code}

... instead do

{code}
+    if (writeBuffer.isEmpty()) {
+      return;
+   }
{code}

Then you don't have whole method body indented.  Gives you more room on a line.  Deals with the isEmpty immediately rather than let it last the length of the method body.

bq. 1- The way retries are handled client-side is ugly, cannot be reused for other operations (so HBASE-880 wouldn't fit in).

I don't follow the above J-D?  Are you saying that we can't do batching with 880?

bq. 3- WRE and NSRE maybe should share a common super class.

Thats fine by me.

bq. 4- If any exception is thrown server-side, like values are too long or columns in the wrong format, the state of the whole transaction will be unknown to the user.

Could we iterate the BatchUpdate array first before committing to check its wholesomeness so if problem, user knows that nothing was committed?

bq. ...but it's not atomic if a row fails, for example, if a value was too long.

What you mean by the above?  That if we have none-WRE or none-NSRE-like exception, then batch is dropped?

Good stuff J-D.










> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636130#action_12636130 ] 

Jim Kellerman commented on HBASE-748:
-------------------------------------

Ok, if flushCommits and initialize need to be public, they need javadoc.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634305#action_12634305 ] 

Jim Kellerman commented on HBASE-748:
-------------------------------------

Review of hbase-748-v1.patch

Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if it falls out of the try/catch block?
Currently it returns -1 which indicates (as I understand it) that the request was unsuccessful.

I do not understand how these changes implement retries since getRegionServerForManyRows does not implement them nor does it call getRegionServerWithRetries which does.



> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634501#action_12634501 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

{quote}
I see you've made BatchUpdate comparable already. Scratch my suggestion above. I like the fact that you can ask a BatchUpdate its size. Should size include row and column sizes too to cover the case where they are really large?
{quote}

Ok!

{quote}
I'd been thinking that doing batch updates, we'd take out the splitsAndClosesLock so splits wouldn't happen. Would this be easier if you moved the array of batch update handlings down into HRegion rather than run it in HRS?
{quote}

I've been thinking about this too, glad to know you see it like that. Yeah, I should change that.

Regards the parens nitpick, sorry it was a mindless copy/paste on my part (look at the trunk version of commit with a single BU).

bq. I don't follow the above J-D? Are you saying that we can't do batching with 880?

It is not in HBASE-880 scope to add batching, it's only to provide a more modifiable API to later add batching (this jira). Current patch needs refactoring to abstract the code in flush so that we can give it a get/delete/commit/etc.

Your two last comments are related. Currently we do most of the validation server-side, moving this in the client uncovers the problem that HTable doesn't know anything about the table it handles. So this is a design issue, where should we do all validations? My proposition: do it client-side, maybe in a helper class. Then the code in the server-side will be cleaner and we won't do RPCs for nothing and multiple validations in the case of retries. This should be in another jira.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636112#action_12636112 ] 

Jim Kellerman commented on HBASE-748:
-------------------------------------

Reviewed patch:

No javadoc for HConnection.getRegionServerForManyRows

Javadoc for HTable.getWriteBuffer, HRegionInterface.batchUpdate(final byte[] regionName, final BatchUpdate[] b), HRegion.batchUpdate(BatchUpdate[]), BatchUpdate.getSize are incomplete

Do HTable.flushCommits and HTable$ClientScanner.initialize need to be public? Can't they just be protected?

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634334#action_12634334 ] 

jdcryans edited comment on HBASE-748 at 9/24/08 4:00 PM:
-------------------------------------------------------------------

bq. Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if it falls out of the try/catch block?

That would return the size of the array which we can compare back in the client. Good idea.

{quote}
I do not understand how these changes implement retries since getRegionServerForManyRows does not implement them nor does it call getRegionServerWithRetries which does.
{quote}

Like I said in my sept 23 comment, this part is ugly and needs more work. It implements retries in the way that it retries rows that didn't get processed. For example :

HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23 is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

This actually works really well but it's not atomic if a row fails, for example, if a value was too long.

      was (Author: jdcryans):
    bq. Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if it falls out of the try/catch block?

That would return the size of the array which we can compare back in the client. Good idea.

{quote}
I do not understand how these changes implement retries since getRegionServerForManyRows does not implement them nor does it call getRegionServerWithRetries which does.
{quote}

Like I said in my sept 23 comment, this part is ugly and needs more work. It implements retries in the way that it retries rows that didn't get processed. For example :

HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23 is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

This actually works really well but it's not atomic...
  
> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632727#action_12632727 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

Here is how I plan to implement the "many rows to many regions" logic.

In HRS, add a new version of batchUpdate that takes an array of RowUpdate (HBASE-880). For this version, it will only iterate over the array and call the current batchUpdate. A bit of logic will be added so that if an WRE gets thrown, we return what was the index of the last inserted row.

In HTable, when the flushing is called, it calls a method that takes an ArrayList of unsorted RowOperation (HBASE-880). Following pseudo code does the rest:

{code}
sort the row operations (called ops)
create a temporary empty list of ops
retrieve the cached region of the first op and mark it as "current"
for i = 0; i < number of ops; i++
  current op is at index i of the array of ops
  add the op to the temporary list
  retrieve the cached region of the following op (if any)
  if current region not equals retrieved region or current op is the last one
    do the operation on region server of current region
    if an WRE is thrown
      retrieve the real region of the op at the index in WRE (becomes the retrieved region)
      reset i to the index of the returned row - 1 in WRE
    the retrieved region is now the current region
    clear the temporary list
{code}

The big trade-off in this algo is that I try to limit the number of queries to .META. by using the cache at the expense of moving potentially big chunks of rows back an forth if the cache is stale. This impact could be diminished if we fetched more .META. rows at each locateRegionInMeta using HBASE-887 instead of using getClosestRowBefore (just a thought). That's what Bigtable does.

Any comments?

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-748:
-------------------------------------

    Attachment: hbase-748-v2.patch

Latest version of this patch, if someone wants to try it with some different buffering values, I would be glad. For the moment 12MB seems like a nice one.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623159#action_12623159 ] 

Andrew Purtell commented on HBASE-748:
--------------------------------------

In addition to updates, gets can be batched this way as well for performance if a client has a list of row keys available and would like to retrieve them in a group.

Christian Hvitved wrote on hbase-user@

> I was thinking of a method that given an array of keys
> could fetch the rows efficiently. For example by finding
> out which regions and regionservers the keys are located at
> using the metadata. Then concurrently a thread could be
> started for each regionserver containing the keys, and the
> regionserver could find all the rows in one method call. 

This could be done client side with Futures and Callables I expect.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.3.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-748:
-------------------------------------

    Attachment: hbase-748-v3.patch

Patch for latest trunk. Seems to fail with RPCing.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch, hbase-748-v2.patch, hbase-748-v3.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634346#action_12634346 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

Regards WRE, in the patch it is thrown for NSRE and WRE. RowOperation is from 880, and it does.

Yes, flushing at 1MB is better than 64MB on my machine. Auto-flush is boolean, false means buffering if the commits.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634334#action_12634334 ] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

bq. Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if it falls out of the try/catch block?

That would return the size of the array which we can compare back in the client. Good idea.

{quote}
I do not understand how these changes implement retries since getRegionServerForManyRows does not implement them nor does it call getRegionServerWithRetries which does.
{quote}

Like I said in my sept 23 comment, this part is ugly and needs more work. It implements retries in the way that it retries rows that didn't get processed. For example :

HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23 is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

This actually works really well but it's not atomic...

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is to have an enhanced version that will send many rows in a single RPC to each region server. To do this, the client code will have to figure which rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.