You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pirk.apache.org by Tim Ellison <t....@gmail.com> on 2016/09/15 13:01:50 UTC
Distributed testing on AWS (was: Re: Next short term goal?)
On 14/09/16 13:55, Ellison Anne Williams wrote:
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.
So do you advise running bare VMs and installing Hadoop, or running the
AWS Elastic Map Reduce service?
Here's where I've been going so far, but don't want to start a wiki
entry with instructions if this is the wrong approach altogether...
- Sign-up for an AWS account.
https://aws.amazon.com
- Obtain access keys
https://console.aws.amazon.com/iam
- Install aws command-line tool
https://aws.amazon.com/cli
- Configure aws tool
Choose a default region in the EMR group
http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region
$ aws configure
AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
Default region name [None]: eu-east-1
Default output format [None]: text
- Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
- Create a Spark cluster
$ aws emr create-cluster \
--name "Spark Cluster" \
--release-label emr-5.0.0 \
--applications Name=Spark \
--ec2-attributes KeyName=SparkClusterKeys \
--instance-type m3.xlarge \
--instance-count 3 \
--use-default-roles
answers a cluster ID, e.g. j-3KVTXXXXXX7UG
- Upload a JAR file
$ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
$ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --command "hadoop jar <pirkJar>
org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
pirkJar>"
- Terminate cluster
$ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
Look at charges per hour and think, there may be a better way...
Regards,
Tim
Re: Distributed testing on AWS (was: Re: Next short term goal?)
Posted by Jacob Wilder <ja...@gmail.com>.
I put up my instructions for GCP and AWS on this page: https://pirk.incubator.apache.org/cloud_instructions
I also have prototype instructions for Azure but their HDInsight platform doesn’t yet support Java 8.
Not everything works completely right but it is a start.
On 9/15/16, 09:01, "Tim Ellison" <t....@gmail.com> wrote:
On 14/09/16 13:55, Ellison Anne Williams wrote:
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.
So do you advise running bare VMs and installing Hadoop, or running the
AWS Elastic Map Reduce service?
Here's where I've been going so far, but don't want to start a wiki
entry with instructions if this is the wrong approach altogether...
- Sign-up for an AWS account.
https://aws.amazon.com
- Obtain access keys
https://console.aws.amazon.com/iam
- Install aws command-line tool
https://aws.amazon.com/cli
- Configure aws tool
Choose a default region in the EMR group
http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region
$ aws configure
AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
Default region name [None]: eu-east-1
Default output format [None]: text
- Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
- Create a Spark cluster
$ aws emr create-cluster \
--name "Spark Cluster" \
--release-label emr-5.0.0 \
--applications Name=Spark \
--ec2-attributes KeyName=SparkClusterKeys \
--instance-type m3.xlarge \
--instance-count 3 \
--use-default-roles
answers a cluster ID, e.g. j-3KVTXXXXXX7UG
- Upload a JAR file
$ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
$ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --command "hadoop jar <pirkJar>
org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
pirkJar>"
- Terminate cluster
$ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
Look at charges per hour and think, there may be a better way...
Regards,
Tim