You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2008/10/09 10:12:44 UTC

[jira] Created: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Run Hadoop sort benchmark on Amazon EC2
---------------------------------------

                 Key: HADOOP-4382
                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
             Project: Hadoop Core
          Issue Type: Test
          Components: contrib/ec2
            Reporter: Tom White
            Assignee: Tom White


By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-4382:
--------------------------------

    Hadoop Flags: [Reviewed]

+1

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382-v2.patch, hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-4382:
------------------------------

    Attachment: hadoop-4382.patch

A script that:

1. Launches a cluster on EC2
2. Waits for the cluster and Hadoop daemons to start
3. Runs a small sort job to warm up the cluster
4. Runs a sort job and emits the job duration
5. Terminates the cluster

Running on an 8 node cluster it took 2742 seconds to sort 32GB of data using the default hadoop-site.xml that the EC2 scripts use. This could be improved by using better settings. 

There are several improvements that could be made to the script, in particular in detecting when the cluster is ready to go (the current script waits until 90% of the nodes are up then waits 1 minute for Hadoop to start). There are more ideas here: http://www.nabble.com/Auto-shutdown-for-EC2-clusters-td20132561.html It would also be good to do multiple runs, discard the first and compute an average.

This should be a good basis for running a regular EC2 benchmark from Hudson.

Comments welcome.

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651168#action_12651168 ] 

Nigel Daley commented on HADOOP-4382:
-------------------------------------

Looks good Tom.  A couple comments:

- should we also run sortvalidation to ensure the sort actually worked?
- what bin dir are you putting the script in?
- perhaps name the script sort-benchmark
- add a line to echo the # minutes into a file as follows for Hudson plot:
{quote}
sort_minutes=`expr ${sort_duration} / 60`
echo "YVALUE=${sort_minutes}" > sort_minutes.properties
{quote}

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651075#action_12651075 ] 

Tom White commented on HADOOP-4382:
-----------------------------------

I should say that the 8 node cluster used large EC2 instances (and the namenode/jobtracker is not included in the 8 nodes).

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-4382:
------------------------------

    Attachment: hadoop-4382-v2.patch

Thanks for the comments Nigel.

New patch incorporating the suggestions. (I've created the patch from the base of Hadoop this time, so the script goes in src/contrib/ec2/bin.)

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382-v2.patch, hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4382) Run Hadoop sort benchmark on Amazon EC2

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651169#action_12651169 ] 

Nigel Daley commented on HADOOP-4382:
-------------------------------------

Argh, Jira wiki notation ate my code snippet.

{noformat}
sort_minutes=`expr ${sort_duration} / 60`
echo "YVALUE=${sort_minutes}" > sort_minutes.properties
{noformat}



> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.