You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Christian Schneider <cs...@gmail.com> on 2013/03/04 16:08:41 UTC

Best Practice: How to start and shutdown a complete cluster or adding nodes when needed (Automated with Java API or Rest) (On EC2)

Hi,
what is they best way to realize this.
On our current scenario we need the cluster only for some overnight
processing.
Therefore it would be good to shutdown the cluster overnight and store the
results on s3.

Could you suggest me some libraries or services for that? Like Whirr?
Or is the Amazons EMR what we need (but the prices are high...)?

Best Regards,
Christian.

Re: Best Practice: How to start and shutdown a complete cluster or adding nodes when needed (Automated with Java API or Rest) (On EC2)

Posted by John Conwell <jo...@iamjohn.me>.

It depends on a couple factors.  First are you developing a product where
customers will need the freedom to choose what cloud provider to use, or
something in house where you can standardize on one cloud provider (like
AWS).  And second, do you only need to spin up Hadoop resources?  Or do you
need other resources on your cloud based cluster like Cassandra, Mongo,
SQL-whatever?

If you only need hadoop and can standardize on AWS (who has the best
prices) then EMR is definitely the way to go. AWS has a full set of APIs in
most languages to allow you to do everything you need.  If you need other
resources deployed and flexibility in different cloud providers, you'll
have to go another route.

I really like a combination of jclouds and Whirr.  I used this previously
to deploy hadoop, cassandra, haproxy, solr, and tomcat clusters, setup all
ingress security rules, and run custom bash scripts.  And it runs on most
cloud providers.  The only problem with Whirr is that its development seems
to have slowed down in the past year or so, as one of the primary guys has
moved onto his own product idea.

John

On Mon, Mar 4, 2013 at 7:08 AM, Christian Schneider <
cschneiderpublic@gmail.com> wrote:

> Hi,
> what is they best way to realize this.
> On our current scenario we need the cluster only for some overnight
> processing.
> Therefore it would be good to shutdown the cluster overnight and store the
> results on s3.
>
> Could you suggest me some libraries or services for that? Like Whirr?
> Or is the Amazons EMR what we need (but the prices are high...)?
>
> Best Regards,
> Christian.
>

-- 

Thanks,
John C

Re: Best Practice: How to start and shutdown a complete cluster or adding nodes when needed (Automated with Java API or Rest) (On EC2)

Posted by John Conwell <jo...@iamjohn.me>.

It depends on a couple factors.  First are you developing a product where
customers will need the freedom to choose what cloud provider to use, or
something in house where you can standardize on one cloud provider (like
AWS).  And second, do you only need to spin up Hadoop resources?  Or do you
need other resources on your cloud based cluster like Cassandra, Mongo,
SQL-whatever?

If you only need hadoop and can standardize on AWS (who has the best
prices) then EMR is definitely the way to go. AWS has a full set of APIs in
most languages to allow you to do everything you need.  If you need other
resources deployed and flexibility in different cloud providers, you'll
have to go another route.

I really like a combination of jclouds and Whirr.  I used this previously
to deploy hadoop, cassandra, haproxy, solr, and tomcat clusters, setup all
ingress security rules, and run custom bash scripts.  And it runs on most
cloud providers.  The only problem with Whirr is that its development seems
to have slowed down in the past year or so, as one of the primary guys has
moved onto his own product idea.

John

On Mon, Mar 4, 2013 at 7:08 AM, Christian Schneider <
cschneiderpublic@gmail.com> wrote:

> Hi,
> what is they best way to realize this.
> On our current scenario we need the cluster only for some overnight
> processing.
> Therefore it would be good to shutdown the cluster overnight and store the
> results on s3.
>
> Could you suggest me some libraries or services for that? Like Whirr?
> Or is the Amazons EMR what we need (but the prices are high...)?
>
> Best Regards,
> Christian.
>

-- 

Thanks,
John C

Re: Best Practice: How to start and shutdown a complete cluster or adding nodes when needed (Automated with Java API or Rest) (On EC2)

Posted by John Conwell <jo...@iamjohn.me>.

It depends on a couple factors.  First are you developing a product where
customers will need the freedom to choose what cloud provider to use, or
something in house where you can standardize on one cloud provider (like
AWS).  And second, do you only need to spin up Hadoop resources?  Or do you
need other resources on your cloud based cluster like Cassandra, Mongo,
SQL-whatever?

If you only need hadoop and can standardize on AWS (who has the best
prices) then EMR is definitely the way to go. AWS has a full set of APIs in
most languages to allow you to do everything you need.  If you need other
resources deployed and flexibility in different cloud providers, you'll
have to go another route.

I really like a combination of jclouds and Whirr.  I used this previously
to deploy hadoop, cassandra, haproxy, solr, and tomcat clusters, setup all
ingress security rules, and run custom bash scripts.  And it runs on most
cloud providers.  The only problem with Whirr is that its development seems
to have slowed down in the past year or so, as one of the primary guys has
moved onto his own product idea.

John

On Mon, Mar 4, 2013 at 7:08 AM, Christian Schneider <
cschneiderpublic@gmail.com> wrote:

> Hi,
> what is they best way to realize this.
> On our current scenario we need the cluster only for some overnight
> processing.
> Therefore it would be good to shutdown the cluster overnight and store the
> results on s3.
>
> Could you suggest me some libraries or services for that? Like Whirr?
> Or is the Amazons EMR what we need (but the prices are high...)?
>
> Best Regards,
> Christian.
>

-- 

Thanks,
John C

Re: Best Practice: How to start and shutdown a complete cluster or adding nodes when needed (Automated with Java API or Rest) (On EC2)

Posted by John Conwell <jo...@iamjohn.me>.

It depends on a couple factors.  First are you developing a product where
customers will need the freedom to choose what cloud provider to use, or
something in house where you can standardize on one cloud provider (like
AWS).  And second, do you only need to spin up Hadoop resources?  Or do you
need other resources on your cloud based cluster like Cassandra, Mongo,
SQL-whatever?

If you only need hadoop and can standardize on AWS (who has the best
prices) then EMR is definitely the way to go. AWS has a full set of APIs in
most languages to allow you to do everything you need.  If you need other
resources deployed and flexibility in different cloud providers, you'll
have to go another route.

I really like a combination of jclouds and Whirr.  I used this previously
to deploy hadoop, cassandra, haproxy, solr, and tomcat clusters, setup all
ingress security rules, and run custom bash scripts.  And it runs on most
cloud providers.  The only problem with Whirr is that its development seems
to have slowed down in the past year or so, as one of the primary guys has
moved onto his own product idea.

John

On Mon, Mar 4, 2013 at 7:08 AM, Christian Schneider <
cschneiderpublic@gmail.com> wrote:

> Hi,
> what is they best way to realize this.
> On our current scenario we need the cluster only for some overnight
> processing.
> Therefore it would be good to shutdown the cluster overnight and store the
> results on s3.
>
> Could you suggest me some libraries or services for that? Like Whirr?
> Or is the Amazons EMR what we need (but the prices are high...)?
>
> Best Regards,
> Christian.
>

-- 

Thanks,
John C