You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Tim Ellison <t....@gmail.com> on 2016/09/15 13:01:50 UTC

Distributed testing on AWS (was: Re: Next short term goal?)

On 14/09/16 13:55, Ellison Anne Williams wrote:
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.

So do you advise running bare VMs and installing Hadoop, or running the
AWS Elastic Map Reduce service?

Here's where I've been going so far, but don't want to start a wiki
entry with instructions if this is the wrong approach altogether...

  - Sign-up for an AWS account.
 	https://aws.amazon.com

  - Obtain access keys
 	https://console.aws.amazon.com/iam

  - Install aws command-line tool
 	https://aws.amazon.com/cli

  - Configure aws tool
 Choose a default region in the EMR group
http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region

 $ aws configure
 AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
 AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
 Default region name [None]: eu-east-1
 Default output format [None]: text

  - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".

  - Create a Spark cluster

 $ aws emr create-cluster \
   --name "Spark Cluster" \
   --release-label emr-5.0.0 \
   --applications Name=Spark \
   --ec2-attributes KeyName=SparkClusterKeys \
   --instance-type m3.xlarge \
   --instance-count 3 \
   --use-default-roles

 answers a cluster ID, e.g. j-3KVTXXXXXX7UG

  - Upload a JAR file

 $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
 $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --command "hadoop jar <pirkJar>
org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
pirkJar>"

  - Terminate cluster

 $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG


Look at charges per hour and think, there may be a better way...

Regards,
Tim

Re: Distributed testing on AWS (was: Re: Next short term goal?)

Posted by Jacob Wilder <ja...@gmail.com>.

I put up my instructions for GCP and AWS on this page: https://pirk.incubator.apache.org/cloud_instructions
I also have prototype instructions for Azure but their HDInsight platform doesn’t yet support Java 8. 

Not everything works completely right but it is a start. 

On 9/15/16, 09:01, "Tim Ellison" <t....@gmail.com> wrote:

    On 14/09/16 13:55, Ellison Anne Williams wrote:
    > In the meantime/very near term, we could provide a step-by-step
    > AWS/GCP/Azure instructions for bringing up a small cluster, running the
    > distributed tests, and debugging. Admittedly, most of this is handled in
    > the AWS/GCP/Azure documentation, but, in my experience, the documentation
    > is confusing and very time consuming to get through the first time.
    
    So do you advise running bare VMs and installing Hadoop, or running the
    AWS Elastic Map Reduce service?
    
    Here's where I've been going so far, but don't want to start a wiki
    entry with instructions if this is the wrong approach altogether...
    
      - Sign-up for an AWS account.
     	https://aws.amazon.com
    
      - Obtain access keys
     	https://console.aws.amazon.com/iam
    
      - Install aws command-line tool
     	https://aws.amazon.com/cli
    
      - Configure aws tool
     Choose a default region in the EMR group
    http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region
    
     $ aws configure
     AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
     AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
     Default region name [None]: eu-east-1
     Default output format [None]: text
    
      - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
    
      - Create a Spark cluster
    
     $ aws emr create-cluster \
       --name "Spark Cluster" \
       --release-label emr-5.0.0 \
       --applications Name=Spark \
       --ec2-attributes KeyName=SparkClusterKeys \
       --instance-type m3.xlarge \
       --instance-count 3 \
       --use-default-roles
    
     answers a cluster ID, e.g. j-3KVTXXXXXX7UG
    
      - Upload a JAR file
    
     $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
     $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --command "hadoop jar <pirkJar>
    org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
    pirkJar>"
    
      - Terminate cluster
    
     $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
    
    
    Look at charges per hour and think, there may be a better way...
    
    Regards,
    Tim