You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/12/22 06:10:18 UTC

[jira] Created: (HBASE-2066) Perf: parallelize puts

Perf: parallelize puts
----------------------

                 Key: HBASE-2066
                 URL: https://issues.apache.org/jira/browse/HBASE-2066
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.2
            Reporter: ryan rawson
             Fix For: 0.20.3, 0.21.0


Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.

Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!

This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson reopened HBASE-2066:
--------------------------------


this will go into 0.20 branch since now we have HBASE-2219 in there

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-3.patch, HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson resolved HBASE-2066.
--------------------------------

    Resolution: Fixed

commited to branch now

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2, 0.20.3
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2066-20-branch.txt, HBASE-2066-3.patch, HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793807#action_12793807 ] 

stack commented on HBASE-2066:
------------------------------

Patch looks great.  

We can't do a version bump in 0.20 branch.  Adding a new method to the interface w/o version bumping doesn't work I suppose.  How about a version in 0.20 that doesn't pass ExcecutorService and a timeout and whose method is named processBatchOfRows rather processBatchOfPuts?

Any chance of some tests?  

Will this fix help in 0.20 branch?

{code}
@@ -845,8 +855,9 @@ public class HConnectionManager implements HConstants {
 
           // by nature of the map, we know that the start key has to be < 
           // otherwise it wouldn't be in the headMap. 
-          if (KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length,
-              row, 0, row.length) <= 0) {
+          if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
+              KeyValue.getRowComparator(tableName).compareRows(endKey, 0, endKey.length,
+              row, 0, row.length) > 0) {
             // delete any matching entry
             HRegionLocation rl =
               tableLocations.remove(matchingRegions.lastKey());
{code}

Do you want to change these:

{code}
+            LOG.debug("Failed all from " + request.address + " due to ExecutionException");
{code}

.. so the are instead:

{code}
+            LOG.debug("Failed all from " + request.address, e);
{code}

Is this done once, getCurrentNrHRS, in the HTable constructor?

looks really good

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

commited to trunk

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-3.patch, HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832821#action_12832821 ] 

ryan rawson commented on HBASE-2066:
------------------------------------

i ran TestBatchPut for a while and inserted 3.3GB of data w/o problems. Ended up with like 4 table splits. No more concurrent exceptions, no major slowdown... the threads got slower as my machine bogged down, but it wasnt some crazy kind of exponential slowdown originally reported. 

if there is no complaints, i'm going to commit this as-is.

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-3.patch, HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803014#action_12803014 ] 

ryan rawson commented on HBASE-2066:
------------------------------------

This is much less ambitious than HBASE-1845 and seeks to optimize the Put case only. 

One of the problems with the original HBASE-1845 patch is that it requires a new API to take advantage of it, thus requires porting code.  Furthermore there is HTable handy things like write buffering, write buffer size settings, etc, etc.  I started with the 1845 patch, and realized we also needed a way to parallelize puts in the normal API.  This is much simpler than 1845 because we don't have to line up return codes (there are no return codes for puts, just exceptions due to temporary issues).

Short: this is a drop in replacement and makes things go fast now. HBASE-1845 requires a new API.

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Attachment: HBASE-2066-3.patch

here is the much awaited new version. i'll run some tests on it and then commit if things look good.

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-3.patch, HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805098#action_12805098 ] 

stack commented on HBASE-2066:
------------------------------

Patch looks good.  Make sure all licenses are 2010 on commit and add some class comment to new classes saying what they do on commit.  You don't up the RPC version?  Otherwise it looks great RR.

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831567#action_12831567 ] 

stack commented on HBASE-2066:
------------------------------

Hey man, commit already!

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802795#action_12802795 ] 

Jeff Hammerbacher commented on HBASE-2066:
------------------------------------------

How does this relate to HBASE-1845?

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Attachment: HBASE-2066-branch.patch

include rpc version bump :-(

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831708#action_12831708 ] 

ryan rawson commented on HBASE-2066:
------------------------------------

looks like a basic thread concurrency problem here.

Now to the performance issues, the current code uses ONE threadpool for everyone, which is currently set to 10 threads static.  The original code used a thread pool per HTable and sized it to the number of regionservers - that is impossible to do in HCM because of chicken-and-egg bootstrap problems (the call we'd use calls HCM.<init> which calls ...).  

Maybe the threadpool should move back into HTable to support parallelism better?  With 10 worker threads for way more than 10 client threads, yeah put performance is going to nosedive.

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2066) Perf: parallelize puts

Posted by "Cosmin Lehene (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831603#action_12831603 ] 

Cosmin Lehene commented on HBASE-2066:
--------------------------------------

Patch fails to apply on trunk.
After manually applying chunks I got these while doing puts

EXCEPTION 1

java.lang.NullPointerException
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:889)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471)
  at TestBatchPut$MyThread.run(TestBatchPut.java:65)


EXCEPTION 2

java.lang.NullPointerException
  at java.util.TreeMap.rotateRight(TreeMap.java:2057)
  at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2217)
  at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
  at java.util.TreeMap.remove(TreeMap.java:585)
  at org.apache.hadoop.hbase.util.SoftValueSortedMap.remove(SoftValueSortedMap.java:104)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.deleteCachedLocation(HConnectionManager.java:897)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1413)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:471)
  at TestBatchPut$MyThread.run(TestBatchPut.java:65)


Also the throughput went down and the max seconds for a put went up (could be also from the hbase restart).

I'll attach the piece of code I'm using to benchmark it

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Status: Patch Available  (was: Open)

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Fix Version/s:     (was: 0.20.3)

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson reassigned HBASE-2066:
----------------------------------

    Assignee: ryan rawson

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-2066:
-------------------------------

    Attachment: HBASE-2066-v2.patch

this is a trunk version with test. 

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2066) Perf: parallelize puts

Posted by "Cosmin Lehene (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cosmin Lehene updated HBASE-2066:
---------------------------------

    Attachment: TestBatchPut.java

run TestBatchPut nr_of_threads nr_of_puts_per_call

> Perf: parallelize puts
> ----------------------
>
>                 Key: HBASE-2066
>                 URL: https://issues.apache.org/jira/browse/HBASE-2066
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2066-branch.patch, HBASE-2066-v2.patch, TestBatchPut.java
>
>
> Right now with large region count tables, the write buffer is not efficient.  This is because we issue potentially N RPCs, where N is the # of regions in the table.  When N gets large (lets say 1200+) things become sloowwwww.
> Instead if we batch things up using a different RPC and use thread pools, we could see higher performance!
> This requires a RPC change...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.