You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rya.apache.org by Adina Crainiceanu <ad...@usna.edu> on 2016/05/10 17:24:04 UTC

what Amazon EC2 instance would be good enough for Rya?

Hi,

Does anyone have any advice for what type of Amazon EC2 instance I could
use to run Accumulo and Rya? I plan on getting an Amazon EC2 instance so we
can collaborate easier and do some experiments - the budget limit is about
$1000 for a year.

I thought that maybe a m3.large or m4.large instance would be good enough -
not sure which one would be better.

Here is a link to the different instance types and pricing:
https://aws.amazon.com/ec2/instance-types/
https://aws.amazon.com/ec2/pricing/



Thank you very much,
Adina

-- 
Dr. Adina Crainiceanu
Associate Professor, Computer Science Department
United States Naval Academy
410-293-6822
adina@usna.edu
http://www.usna.edu/Users/cs/adina/

Re: what Amazon EC2 instance would be good enough for Rya?

Posted by Adina Crainiceanu <ad...@usna.edu>.
Keith,

Thank you very much for sharing your experience. Very useful. I'll try for
the m3 type of instances, as the i2 is too expensive. Thanks for the link
to Zetten. I'll try it out.

Thank you,
Adina

On Tue, May 10, 2016 at 2:35 PM, Keith Turner <ke...@deenlo.com> wrote:

> I have used a few different instance types for Fluo and Accumulo release
> testing.  I don't have any recommendations, but I can share what I have
> done.
>
> I used to use m1.large instances for Accumulo testing because of the low
> price and large amount of local instance storage.  However, with recent
> AMIs (like Centos 7) dropping support for PVM, I have stopped using these.
> For the most recent Accumulo scale testing I did, I used d2.xlarge
> instances (which have lots of local storage).  These are more expensive so
> I used less of them.  I run the Accumulo continuous ingest and random walk
> cluster test suites on these EC2 clusters.  Both generate lots of random
> data.
>
> For Fluo testing I have been using m3.xlarge instances recently.  These
> only have 80G of local SSD storage.  I am interested in running test with
> other instance types like i2.xlarge.  For Fluo testing I run the stress
> test which generates random data and Webindex which uses real data from
> Common Crawl.
>
> Even though the performance per node is not so great, I like using the
> cheaper m1.large and m3.xlarge instance types because I can get more nodes
> for less.  This is nice for finding issue that only occur at scale.
>
> I have not looked into using m4 nodes because they have no local instance
> storage.  A few years ago I did experiments with Accumulo using EBS vs
> local instance storage and found a large difference in performance.  I have
> not revisted that.  I have also not looked into using S3 for HDFS.  If
> anyone has any info on EBS vs S3 vs instance storage for S3 let me know.
> For my purposes, running test for a few days, I don't care if the data goes
> away when I terminate the cluster.
>
> We created a project called Zetten[1] to automate setting up Accumulo and
> Fluo on EC2.  I use Zetten for all of my Accumulo and Fluo testing now.
>
> [1]: https://github.com/fluo-io/zetten
>
> Keith
>
>
>
>
>
>
> On Tue, May 10, 2016 at 1:24 PM, Adina Crainiceanu <ad...@usna.edu> wrote:
>
> > Hi,
> >
> > Does anyone have any advice for what type of Amazon EC2 instance I could
> > use to run Accumulo and Rya? I plan on getting an Amazon EC2 instance so
> we
> > can collaborate easier and do some experiments - the budget limit is
> about
> > $1000 for a year.
> >
> > I thought that maybe a m3.large or m4.large instance would be good
> enough -
> > not sure which one would be better.
> >
> > Here is a link to the different instance types and pricing:
> > https://aws.amazon.com/ec2/instance-types/
> > https://aws.amazon.com/ec2/pricing/
> >
> >
> >
> > Thank you very much,
> > Adina
> >
> > --
> > Dr. Adina Crainiceanu
> > Associate Professor, Computer Science Department
> > United States Naval Academy
> > 410-293-6822
> > adina@usna.edu
> > http://www.usna.edu/Users/cs/adina/
> >
>



-- 
Dr. Adina Crainiceanu
Associate Professor, Computer Science Department
United States Naval Academy
410-293-6822
adina@usna.edu
http://www.usna.edu/Users/cs/adina/

Re: what Amazon EC2 instance would be good enough for Rya?

Posted by Keith Turner <ke...@deenlo.com>.
I have used a few different instance types for Fluo and Accumulo release
testing.  I don't have any recommendations, but I can share what I have
done.

I used to use m1.large instances for Accumulo testing because of the low
price and large amount of local instance storage.  However, with recent
AMIs (like Centos 7) dropping support for PVM, I have stopped using these.
For the most recent Accumulo scale testing I did, I used d2.xlarge
instances (which have lots of local storage).  These are more expensive so
I used less of them.  I run the Accumulo continuous ingest and random walk
cluster test suites on these EC2 clusters.  Both generate lots of random
data.

For Fluo testing I have been using m3.xlarge instances recently.  These
only have 80G of local SSD storage.  I am interested in running test with
other instance types like i2.xlarge.  For Fluo testing I run the stress
test which generates random data and Webindex which uses real data from
Common Crawl.

Even though the performance per node is not so great, I like using the
cheaper m1.large and m3.xlarge instance types because I can get more nodes
for less.  This is nice for finding issue that only occur at scale.

I have not looked into using m4 nodes because they have no local instance
storage.  A few years ago I did experiments with Accumulo using EBS vs
local instance storage and found a large difference in performance.  I have
not revisted that.  I have also not looked into using S3 for HDFS.  If
anyone has any info on EBS vs S3 vs instance storage for S3 let me know.
For my purposes, running test for a few days, I don't care if the data goes
away when I terminate the cluster.

We created a project called Zetten[1] to automate setting up Accumulo and
Fluo on EC2.  I use Zetten for all of my Accumulo and Fluo testing now.

[1]: https://github.com/fluo-io/zetten

Keith






On Tue, May 10, 2016 at 1:24 PM, Adina Crainiceanu <ad...@usna.edu> wrote:

> Hi,
>
> Does anyone have any advice for what type of Amazon EC2 instance I could
> use to run Accumulo and Rya? I plan on getting an Amazon EC2 instance so we
> can collaborate easier and do some experiments - the budget limit is about
> $1000 for a year.
>
> I thought that maybe a m3.large or m4.large instance would be good enough -
> not sure which one would be better.
>
> Here is a link to the different instance types and pricing:
> https://aws.amazon.com/ec2/instance-types/
> https://aws.amazon.com/ec2/pricing/
>
>
>
> Thank you very much,
> Adina
>
> --
> Dr. Adina Crainiceanu
> Associate Professor, Computer Science Department
> United States Naval Academy
> 410-293-6822
> adina@usna.edu
> http://www.usna.edu/Users/cs/adina/
>