You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/05/28 11:48:23 UTC

[jira] [Created] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Lars Hofhansl created HBASE-6116:
------------------------------------

             Summary: Allow parallel HDFS writes for HLogs.
                 Key: HBASE-6116
                 URL: https://issues.apache.org/jira/browse/HBASE-6116
             Project: HBase
          Issue Type: Bug
            Reporter: Lars Hofhansl
            Assignee: Lars Hofhansl


In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-6116:
----------------------------------

    Attachment: pipelined-vs-parallel-comparison.zip

Attached is a comparison of pipelined vs. parallel sync differences on two identical (but different) 5 slave EC2 clusters. I modified HBase to use a new histogram metric for recording HLog sync latency and then ran a write dominant workload on each cluster using LoadTestTool for 60 minutes and captured RegionServer metrics at one second intervals.

The first tab of the spreadsheet describes the experiment parameters. The second shows mean, 99th percentile, and standard deviation for pipelined syncs as reported. The third shows mean, 99th percentile, and standard deviation for parallel syncs as reported. The fourth has some simple graphs I threw together for illustration. The remaining tabs contain the detail of the captured metrics for each host.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291653#comment-13291653 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

If it makes testing easier I could attach a patch of HDFS-1783 against Hadoop-2.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401158#comment-13401158 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

Awesome. Thanks!
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404593#comment-13404593 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

Also, one difference is I ran LoadTestTool from the master node.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401016#comment-13401016 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

@Andy: Fair enough :)
BTW, do you still have the HBase-0.94 patch you made (so I do not have to do the same work)?

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400195#comment-13400195 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

Again thanks for doing this Andy.
Looks like the latency is indeed reduced greatly, by around 30% - judged by the mean.

The slower instances now hover around 8ms as opposed to 12ms before, and the other instances around 6ms as opposed to 8ms.

I'm still trying to interpret the 90th percentile numbers.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404359#comment-13404359 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

I am having a hard time quantifying any advantage from this patch with my tests in our DEV cluster.
So I no longer think that this is a worthwhile avenue to follow.

I used PerformanceEvaluation. I hacked it to be able to test with smaller packets and/or autoflush enabled, and in no scenario did I see a statistically significant advantage when this patch was enabled.

Will close as "Won't fix" unless somebody else can think of other ways of testing this.

@Ted: Maybe you can do the test you test you for multiple WALs?

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404536#comment-13404536 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

This was with a 6 DN/RS cluster with real HW. :)
Whatever scenario I tried the parallel write path was never faster.

I tried with PerformanceEvaluation and defaults and a presplit table (i.e. 1000 byte values, no autoflush, etc). In that case I basically just saturated the client's network (~ 92mb/s in a 1gig link). I tested with --nomapred.
I then tried with a single region, to see if that one RegionServer would seen an advantage.
When that did not show any gains, I hacked PerformanceEvaluation to let me use smaller - 100 byte - value and to optionally also enable autoflush.
Now the network on the client is no longer saturated.
In that case parallel writes were actually slower, which really surprised me, as I had assumed that many individual puts that are all written to the WAL would show a big post for parallel writes.

It's possible test my testing methodology is flawed.

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400185#comment-13400185 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

In my experience there is little difference in variability run to run on the same instances as opposed to two parallel runs started around the same time. EC2 is not a great platform for this kind of testing so it should be run on real hardware to see if the results are replicated there. Or I could try the so called cluster compute instances. They are expensive but given the result confirming it with those could be justified 
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285503#comment-13285503 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

Waiting for HDFS-1783 API to stabilize before I post a patch here.
I don't currently have access to a real cluster, so if someone could do some testing in a real cluster with Hadoop-Trunk and HBase-Trunk, please let me know!

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404551#comment-13404551 ] 

stack commented on HBASE-6116:
------------------------------

Dumb?: For sure the //writing was enabled?  If saturated network, I could imagine it not making a diff but if spare bandwidth.... I'd think it'd show through.  The //write would make us saturate the network on an interface before a -- (pipeline) write would I suppose.  And its easy enough for PE to saturate network IIRC anyway, w/o //writes.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404621#comment-13404621 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

I'll try this on Monday. I'll also run PE from within the DC.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404614#comment-13404614 ] 

stack commented on HBASE-6116:
------------------------------

@Lars If you remove hbase from the equation, what do you see?  (There is an hfile PE tool beside the PE tool... does same thing IIRC but just w/ hfiles)
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397933#comment-13397933 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

I ported HDFS-1783 and this patch to Hadoop 2.0.1 and HBase 0.94. A stressful test on a 5 slave cluster with LoadTestTool is still stable after 8 hours. I only have EC2 resources so am not sure a relative performance benchmark would be that meaningful, but I could try.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404563#comment-13404563 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

Yeah... I'll add a bit more diagnostics logging to make 100% sure as this definitely surprised me.
The length (latency) of the pipe from the client to the RS is long (at least in my test) as compared to the length of pipes between the RSs and the DNs.
Also the writing of the blocks is already interleaving with writing to the OS buffers, so the pipeline might not have had as much as an effect as expected.

I'll also test together with durable sync, as this could show a different pattern.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400224#comment-13400224 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

I'm going to do this again tomorrow on cluster compute instances.  The results should be cleaner. 
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404523#comment-13404523 ] 

stack commented on HBASE-6116:
------------------------------

@Lars So, you don't see a perf boost writing in parallel?  Where is the 30% that Andrew was getting on EC2?  You don't see it on your dev cluster?  (You fellas at SF use toy hardware or what?)
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400075#comment-13400075 ] 

Andrew Purtell edited comment on HBASE-6116 at 6/23/12 11:39 PM:
-----------------------------------------------------------------

Attached is a comparison of pipelined vs. parallel sync differences on two identical (but different) 5 slave EC2 clusters. I modified HBase to use a new histogram metric for recording HLog sync latency and then ran a write dominant workload on each cluster using LoadTestTool for 60 minutes and captured RegionServer metrics at one second intervals.

The first tab of the spreadsheet describes the experiment parameters. The second shows mean, 99th percentile, and standard deviation for pipelined syncs as reported. The third shows mean, 99th percentile, and standard deviation for parallel syncs as reported. The fourth has some simple graphs I threw together for illustration. The remaining tabs contain the detail of the captured metrics for each host.

Edit: Note all metrics are in milliseconds.
                
      was (Author: apurtell):
    Attached is a comparison of pipelined vs. parallel sync differences on two identical (but different) 5 slave EC2 clusters. I modified HBase to use a new histogram metric for recording HLog sync latency and then ran a write dominant workload on each cluster using LoadTestTool for 60 minutes and captured RegionServer metrics at one second intervals.

The first tab of the spreadsheet describes the experiment parameters. The second shows mean, 99th percentile, and standard deviation for pipelined syncs as reported. The third shows mean, 99th percentile, and standard deviation for parallel syncs as reported. The fourth has some simple graphs I threw together for illustration. The remaining tabs contain the detail of the captured metrics for each host.
                  
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398772#comment-13398772 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

Any datapoint is helpful. :)
Parallel vs. not will be most interesting.
Durable sync on EC2 might be interesting as it might be slow on EC2.

Whatever time permits... I know you're busy. Thanks so much for spending time on this.
We'll be testing too when I'm back in the US.

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285760#comment-13285760 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

When there is a patch I can try it out on a test cluster in EC2.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400801#comment-13400801 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

bq. I'm going to do this again tomorrow on cluster compute instances. The results should be cleaner.

I should have realized this earlier, but CC instances don't support instance store volumes, only EBS. EBS is IMO a crappy storage subsystem, in my experience instance store is about 2x slower than real hardware, but EBS can be 10x+ slower and highly variable. So this completes what I can do with EC2.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400180#comment-13400180 ] 

Jonathan Hsieh commented on HBASE-6116:
---------------------------------------

Wow, nice!  

Is there any reason why the tests weren't run on exactly the same EC2 instances with different configuration/binaries?  

Is the variation due to EC2 node variation or because of the software changes? Do you think that matters?  



                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398258#comment-13398258 ] 

Lars Hofhansl commented on HBASE-6116:
--------------------------------------

@Andrew: Awesome!
A comparative benchmark is still useful I think. Only if it is not too much work, as the standard deviation will be high on EC2. I think you'll see an improvement in a write heavy test.

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6116:
---------------------------------

    Attachment: 6116-v1.txt

Initial patch, which includes HBASE-5954.
This also fixes building HBase trunk with Hadoop trunk (3.0.0-SNAPSHOT).

In order to test, HDFS-1783 needs to be applied to Hadoop (trunk) first.
Then build Hadoop with:
mvn -Pnative -Pdist -Dtar -DskipTests install
And then HBase with:
mvn -DskipTests -Dhadoop.profile=3.0 ...

Parallel writes can be enable in hbase-site.xml with:
    hbase.regionserver.wal.parallel.writes

Since this patch include HBASE-5954, durable sync can also be enabled:
    hbase.regionserver.wal.durable.sync
    hbase.regionserver.hfile.durable.sync

(all options can be set to "true")

@Andy: If your offer to do a quick test in EC2 still stands that'd be awesome!

                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404592#comment-13404592 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

It's worth double checking the results with real hardware, but those are what matter IMO.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398651#comment-13398651 ] 

Andrew Purtell commented on HBASE-6116:
---------------------------------------

@Lars, sure. Parallel vs. not only I presume, or are you interested in the difference with durable sync enabled also?
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6116) Allow parallel HDFS writes for HLogs.

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-6116:
----------------------------------

    Attachment: apurtell-patches.zip

bq. BTW, do you still have the HBase-0.94 patch you made (so I do not have to do the same work)?

Attached as 'apurtell-patches.zip'. The second HBase patch won't apply directly, in this code base the histograms are scaled to milliseconds, but the changes are easy to apply by hand.
                
> Allow parallel HDFS writes for HLogs.
> -------------------------------------
>
>                 Key: HBASE-6116
>                 URL: https://issues.apache.org/jira/browse/HBASE-6116
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6116-v1.txt, apurtell-patches.zip, pipelined-vs-parallel-comparison.zip
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira