You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/04/11 23:12:04 UTC

[jira] Created: (HBASE-576) Investigate IPC performance

Investigate IPC performance
---------------------------

                 Key: HBASE-576
                 URL: https://issues.apache.org/jira/browse/HBASE-576
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: ipc
    Affects Versions: 0.1.0, 0.2.0, 0.1.2, 0.1.1
            Reporter: Jim Kellerman


Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.

Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 

We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636791#action_12636791 ] 

stack commented on HBASE-576:
-----------------------------

Above comment 'stack - 02/Oct/08 10:48 PM' is incorrect.  In fact it's a Client per socketfactory rather than a Client per remote host.   A Client goes against many hosts.  For each, Client keeps a cache of Connections keyed by remote address (and then some).  An invocation on the remote host sends the request then sleeps till the response comes back.  During sending and receipt of response, the connection is devoted but otherwise, is available.  Unless the request or response large, socket is being 'multiplexed' (if large, need to chunk request/response).

Here's sample from a log with the RPC logging enabled plus some logging I added.  The log is of 4 threads running in single JVM each running PerformanceEvaluation test (need above patch applied).  The lone number after the name of the class is the thread name.   Threads were named 0, 1, 2, and 3.  The numbers at the end with the '####' in front are request ids.  See how they are interlaced.

{code}
...
08/10/03 23:16:03 INFO ipc.HBaseClient: 3 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1330
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1327
08/10/03 23:16:03 DEBUG ipc.Client: 0 FINISHED WAITING ON ###1327
08/10/03 23:16:03 INFO ipc.HBaseClient: 0 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 80
08/10/03 23:16:03 INFO ipc.HBaseClient: 0 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1331
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1328
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1329
08/10/03 23:16:03 DEBUG ipc.Client: 1 FINISHED WAITING ON ###1328
08/10/03 23:16:03 INFO ipc.HBaseClient: 1 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.Client: 2 FINISHED WAITING ON ###1329
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 97
08/10/03 23:16:03 INFO ipc.HBaseClient: 2 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 98
08/10/03 23:16:03 INFO ipc.HBaseClient: 1 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 INFO ipc.HBaseClient: 2 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1332
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1333
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1330
08/10/03 23:16:03 DEBUG ipc.Client: 3 FINISHED WAITING ON ###1330
08/10/03 23:16:03 INFO ipc.HBaseClient: 3 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 99
08/10/03 23:16:03 INFO ipc.HBaseClient: 3 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1334
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1331
08/10/03 23:16:03 DEBUG ipc.Client: 0 FINISHED WAITING ON ###1331
08/10/03 23:16:03 INFO ipc.HBaseClient: 0 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 113
08/10/03 23:16:03 INFO ipc.HBaseClient: 0 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1335
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1332
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1333
08/10/03 23:16:03 DEBUG ipc.Client: 1 FINISHED WAITING ON ###1332
08/10/03 23:16:03 INFO ipc.HBaseClient: 1 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.Client: 2 FINISHED WAITING ON ###1333
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 178
08/10/03 23:16:03 INFO ipc.HBaseClient: 2 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
08/10/03 23:16:03 DEBUG ipc.HbaseRPC: Call: get 178
08/10/03 23:16:03 INFO ipc.HBaseClient: 1 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 INFO ipc.HBaseClient: 2 org.apache.hadoop.ipc.HBaseClient@da9ea4 start /208.76.44.139:60020
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1336
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user sending #####1337
08/10/03 23:16:03 DEBUG ipc.Client: IPC Client (47) connection to /208.76.44.139:60020 from an unknown user got value #####1334
08/10/03 23:16:03 DEBUG ipc.Client: 3 FINISHED WAITING ON ###1334
08/10/03 23:16:03 INFO ipc.HBaseClient: 3 org.apache.hadoop.ipc.HBaseClient@da9ea4 done
...
{code}

Hadoop RPC doesn't seem to have a 'big lock' at its core, not unless the request or response large. Its pretty concurrent.

Will look now at trying to hack in grizzly in the least intrusive manner just to see if grizzly is faster.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-576.
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.19.0

Resolving.  Was able to improve RPC some after investigation.  If we want to squeeze more juice out of RPC, we need to bring all of hadoop RPC local and start hacking on it or do our own from scatch.   Things to look into would be undoing reflection -- though this seems to be relatively inexpensive going by recent profilings and it makes extending RPC simple -- and we'd look at adding non-blocking to the clientside and sending big data in chunks. The need for a new RPC seems less necessary given the above investigations having learned that current HRPC is concurrent to some degree on the clientside and that the serverside is already non-blocking, the reason I thought an nio framework like grizzly was worth consideration.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>             Fix For: 0.19.0
>
>         Attachments: 576-nomethodname.patch, htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638121#action_12638121 ] 

stack commented on HBASE-576:
-----------------------------

Did a little informal test.  I put up 4 regionservers in my little cluster, loaded it w/ million rows then did random reading against the million rows from a remote client (different network).   I tried with different numbers of clients.  While the clients ran, I watched them in the profiler and I watched the requests/second up on the master node.  Here's a rough recording of what I saw up on the master requests/second.

1 client - 230/s
4 clients - 630/s
16 clients - 1050/s
32 clients - 2460/s
64 clients - 2770/s
128 clients - 1150/s
256 clients - 1100/s

For the 128 and 256 clients, I could most threads blocked in the client.   According to the profiler, when 4 or more clients, the RPC threads are spending all their time i/o on the net.  That 4 clients don't max the request/second would seem to say servers can easily carry more than one client request at a time (duh).  When the number of clients goes > 64, client looks like it starts to trip itself up spending bulk of time blocked.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637380#action_12637380 ] 

stack commented on HBASE-576:
-----------------------------

Thanks for the patch J-D.  Went in w/ some slop.  I removed an unneeded import.

Any chance of your making a patch to replace all instances of ObjectWritable in HbaseRPC with HbaseObjectWritable?  Its a silly regression on my part broken in the below commit:

{code}
r679212 | stack | 2008-07-23 15:13:23 -0700 (Wed, 23 Jul 2008) | 1 line

HBASE-770 Update HBaseRPC to match hadoop 0.17 RPC
{code}

I did some more timings. An instance with 3 threads did 936/second random reads which is about twice a single thread and about 2/3rds 8 threads.



> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-576) Investigate IPC performance

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-576:
-------------------------------------

    Attachment: htd.patch

Patch that fixes some object creations in client that wasted 5% of the CPU.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637353#action_12637353 ] 

stack commented on HBASE-576:
-----------------------------

Thanks J-D. Patch looks good.  Pity couldn't be fixed better but yeah, would need migration script.  As is will save a bunch of churn.  Let me commit it.

Looking at rpc, I see I broke it a while back; I removed the very reason we subclass RPC.  I replaced all our carefully planted HbaseObjectWritables with default ObjectWritables.  Means we're sending Strings instead of codes for our parameter names.

So, did a test where a cluster had 1M rows loaded into 11 regions spread over 3 machines.  A single client could random-read at ~482/second.  Using above patch and running with 8 threads, was able to read at 1531/second.  Basic formula: throughput can be multiplied by # of threads up to maximum of number of cluster members: e.g. if 8 threads but only 3 servers, can only see 3X throughput improvement.  If 8 servers hosting regions, should see 8X.



> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-576:
------------------------

    Attachment: 576-nomethodname.patch

Patch that passes codes rather than method names over RPC.  Also includes the PE patch and the restoration of HbaseObjectWritable.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: 576-nomethodname.patch, htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639598#action_12639598 ] 

stack commented on HBASE-576:
-----------------------------

I backported J-D's HTD patch.  Profiling 0.18 branch, checking if key is of root region is consuming loads of CPU.  This squashes that.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>             Fix For: 0.19.0
>
>         Attachments: 576-nomethodname.patch, htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638033#action_12638033 ] 

stack commented on HBASE-576:
-----------------------------

Patch that puts back HbaseObjectWritable and that sends codes for methodnames across the wire makes PE test 20% faster writing and reading over 0.18.1.  Patch coming soon.



> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-576:
------------------------

    Attachment: pe.patch

Here's some changes to PE so can run multiple clients all up in the one VM; each client runs in its own thread.  Also added argument that takes how many rows to run per test.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637337#action_12637337 ] 

stack commented on HBASE-576:
-----------------------------

Profiling, looks like multithreaded client is mostly waiting.  Retrying with clean everything so can get decent figures on throughput.

One thing I noticed profiling is that ~5% of client-side CPU is spent doing convertions of keys into and out of bytes and booleans and longs going to HTableDescriptor (HTD isRootRegion/isMetaRegion became hotspot when we did hacks to make binary keys work).  At a minimum, the keys should be static finals rather than composed on each invocation and we should be putting native types into our HTD map.   They could even be static final ImmutableBytesWritables.

Another thing to fix in client is sending the method name over as part of the Invocation.  We've already made it so we don't send class name of parameter by subclassing hadoop RPC and jiggering ObjectWritable.  Wouldn't take much to do same for method name.  We spend a bunch of time sending and reading the method name; 10s of percents of CPU.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588829#action_12588829 ] 

stack commented on HBASE-576:
-----------------------------

We should check out Dennis's patch: HADOOP-3053

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.2.0, 0.1.2, 0.1.0, 0.1.1
>            Reporter: Jim Kellerman
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638124#action_12638124 ] 

stack commented on HBASE-576:
-----------------------------

Doing same again running client on same net as cluster but running 3 regionservers only, I see that as the number of clients climb, the sustained throughput jumps around alot whereas its steady when number of clients are smaller.

1 client - 450/s
3 clients - 930/s
9 clients - 1200/s

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636537#action_12636537 ] 

stack commented on HBASE-576:
-----------------------------

Looking at grizzly, the nio framework, and at our current RPC.

Grizzly drop-in looks straight-forward enough; add to grizzly a protocol that can send and receive Writables.  Would be good to keep up things like the ping feature in current RPC and not change the error types, messages, and provocations that we've come to know and love.   The bulk of the current RPC code which collects up method name and parameters into an Invoker Writable would be remain though recast some.

Current RPC has single ipc Client instance per remote host (get client from cache using remote address and socketfactory hash).  Because only one Client per remote server, then its looking like request/response's will always run in series (though code would seem to support many Clients contending sending requests each sleeping till its response comes back).  In-series is fine for the usual case.  If multiple concurrent HTables in the one VM each trying to do its lookup on .META. say, the invocations run in series too.  Not the end of the world but could be better (Running tests to confirm).

One thing I notice is that the one Connection per remote host is propagated up into TableServers in hbase client.  Need to undo it.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-576:
---------------------------

    Assignee: stack

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637562#action_12637562 ] 

stack commented on HBASE-576:
-----------------------------

Below numbers are for cluster of three regionservers w/ client running on master -- which was not running regionserver.

482/s w/ 1 thread
936/s w/ 3 threads
1092/s w/ 4 threads
1531/s w/ 8 threads
1782/s w/ 16 threads

My cluster is small.  That probably has something to do w/ why the less-than-linear progress.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.1.0, 0.1.1, 0.1.2, 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: stack
>         Attachments: htd.patch, pe.patch
>
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-576) Investigate IPC performance

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589134#action_12589134 ] 

stack commented on HBASE-576:
-----------------------------

Dennis says patch does not improve throughput.  Suggests looking at https://grizzly.dev.java.net/.  There is also http://mina.apache.org/features.html.

Lets first study what is taking the time -- serialization, introspection -- and try and speed up hadoop ipc.

> Investigate IPC performance
> ---------------------------
>
>                 Key: HBASE-576
>                 URL: https://issues.apache.org/jira/browse/HBASE-576
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.2.0, 0.1.2, 0.1.0, 0.1.1
>            Reporter: Jim Kellerman
>
> Turning off all file I/O, and running the PerformanceEvaluation test, of 1,048,576 sequential writes to HBase managed to achieve only 7,285 IPCs per second.
> Running PerformanceEvaluation sequential write test modified to do an abort instead of a commit, it was possible to do 68,337 operations per second. We are obviously spending a lot of time doing IPCs. 
> We need to investigate to find the bottleneck. Marshalling and unmarshalling? Socket setup and teardown?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.