You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by José Luis Larroque <la...@gmail.com> on 2015/10/18 21:03:39 UTC

Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Hi all !

I started to use hadoop with aws, and a big question appears in front of me!

I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
some trivial examples, and before moving forward i have one question.

What is the better option for using Hadoop on AWS?
- Build it from scratch on a EC2 instance
- Use MapR distribution of Hadoop
- Use Amazon distribution of Hadoop

Sorry if my question is too broad.

Bye!
Jose

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answers!

@Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the
benefits of using it against building from scratch on a EC2 instance (i
found tutorials using all of this options alike)
@jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem
the right option, i'm trying to keep things simple, because i don't have
much experience with this tecnologies.

Any other answer will be welcome!

Bye!
Jose

2015-10-19 12:37 GMT-03:00 jay vyas <ja...@gmail.com>:

> Also, ASF BigTop packages hadoop for you.
>
> You can always grab our releases
> http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
>
> We package pig, spark, hive, hbase, ....
>
> Its not had to set up a bigtop build server, as we have dockerized the
> packaging of both RPM and Deb packages, and you can experiment locally with
> this stuff using the vagrant recipes.
>
>
>
> On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Hey Jose
>>
>> Have you looked at Amazon emr ( elastic map reduce) where I work we have
>> used it and when you provision the emr instance you can use custom jars
>> like the one you mentioned.
>>
>> In terms of storage you can use either hdfs, if you are going to keep a
>> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>>
>> Documentation for emr is really good. At the time when we did this and
>> this was at the beginning of this year and they supported Hadoop 2.6.
>>
>> In my honest opinion you are giving yourself a lot of extra work for
>> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
>> there. I managed to tool up and learn how to work with emr in a week.
>>
>> Sent from my iPhone
>>
>> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
>> wrote:
>>
>> Thanks for your answer Anders.
>>
>> -The amount of data that i'm going to manipulate it's like the wikipedia
>> (i will use a dump)
>> - I already have the basics of hadoop (i hope), i have a local multinode
>> cluster setup and i already executed some algorithms.
>> - Because the amount of data its important, i believe that i should use
>> several nodes.
>>
>> Maybe another option to considerate should be that i'm running Giraph on
>> top of the selected hadoop distribution/EC2.
>>
>> Bye!
>> Jose
>>
>> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <
>> anders.shinde.nielsen@gmail.com>:
>>
>>> Dear Jose,
>>>
>>> It will help people answer your question if you specify your goals :
>>>
>>> -If you do it to learn how to USE a running Hadoop then go for one of
>>> the prebuilt distributions (Amazon or MapR)
>>> -If you do it to learn more about the setting up and administrating
>>> Hadoop then you are better off setting everything up from scratch on EC2.
>>> -Do you need to run on many nodes or just a 1 node to test some
>>> Mapreduce scripts on a small data set?
>>>
>>> Regards,
>>>
>>> Anders
>>>
>>>
>>>
>>>
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>>> larroquester@gmail.com> wrote:
>>>
>>>> Hi all !
>>>>
>>>> I started to use hadoop with aws, and a big question appears in front
>>>> of me!
>>>>
>>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>>> some trivial examples, and before moving forward i have one question.
>>>>
>>>> What is the better option for using Hadoop on AWS?
>>>> - Build it from scratch on a EC2 instance
>>>> - Use MapR distribution of Hadoop
>>>> - Use Amazon distribution of Hadoop
>>>>
>>>> Sorry if my question is too broad.
>>>>
>>>> Bye!
>>>> Jose
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answers!

@Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the
benefits of using it against building from scratch on a EC2 instance (i
found tutorials using all of this options alike)
@jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem
the right option, i'm trying to keep things simple, because i don't have
much experience with this tecnologies.

Any other answer will be welcome!

Bye!
Jose

2015-10-19 12:37 GMT-03:00 jay vyas <ja...@gmail.com>:

> Also, ASF BigTop packages hadoop for you.
>
> You can always grab our releases
> http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
>
> We package pig, spark, hive, hbase, ....
>
> Its not had to set up a bigtop build server, as we have dockerized the
> packaging of both RPM and Deb packages, and you can experiment locally with
> this stuff using the vagrant recipes.
>
>
>
> On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Hey Jose
>>
>> Have you looked at Amazon emr ( elastic map reduce) where I work we have
>> used it and when you provision the emr instance you can use custom jars
>> like the one you mentioned.
>>
>> In terms of storage you can use either hdfs, if you are going to keep a
>> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>>
>> Documentation for emr is really good. At the time when we did this and
>> this was at the beginning of this year and they supported Hadoop 2.6.
>>
>> In my honest opinion you are giving yourself a lot of extra work for
>> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
>> there. I managed to tool up and learn how to work with emr in a week.
>>
>> Sent from my iPhone
>>
>> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
>> wrote:
>>
>> Thanks for your answer Anders.
>>
>> -The amount of data that i'm going to manipulate it's like the wikipedia
>> (i will use a dump)
>> - I already have the basics of hadoop (i hope), i have a local multinode
>> cluster setup and i already executed some algorithms.
>> - Because the amount of data its important, i believe that i should use
>> several nodes.
>>
>> Maybe another option to considerate should be that i'm running Giraph on
>> top of the selected hadoop distribution/EC2.
>>
>> Bye!
>> Jose
>>
>> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <
>> anders.shinde.nielsen@gmail.com>:
>>
>>> Dear Jose,
>>>
>>> It will help people answer your question if you specify your goals :
>>>
>>> -If you do it to learn how to USE a running Hadoop then go for one of
>>> the prebuilt distributions (Amazon or MapR)
>>> -If you do it to learn more about the setting up and administrating
>>> Hadoop then you are better off setting everything up from scratch on EC2.
>>> -Do you need to run on many nodes or just a 1 node to test some
>>> Mapreduce scripts on a small data set?
>>>
>>> Regards,
>>>
>>> Anders
>>>
>>>
>>>
>>>
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>>> larroquester@gmail.com> wrote:
>>>
>>>> Hi all !
>>>>
>>>> I started to use hadoop with aws, and a big question appears in front
>>>> of me!
>>>>
>>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>>> some trivial examples, and before moving forward i have one question.
>>>>
>>>> What is the better option for using Hadoop on AWS?
>>>> - Build it from scratch on a EC2 instance
>>>> - Use MapR distribution of Hadoop
>>>> - Use Amazon distribution of Hadoop
>>>>
>>>> Sorry if my question is too broad.
>>>>
>>>> Bye!
>>>> Jose
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answers!

@Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the
benefits of using it against building from scratch on a EC2 instance (i
found tutorials using all of this options alike)
@jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem
the right option, i'm trying to keep things simple, because i don't have
much experience with this tecnologies.

Any other answer will be welcome!

Bye!
Jose

2015-10-19 12:37 GMT-03:00 jay vyas <ja...@gmail.com>:

> Also, ASF BigTop packages hadoop for you.
>
> You can always grab our releases
> http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
>
> We package pig, spark, hive, hbase, ....
>
> Its not had to set up a bigtop build server, as we have dockerized the
> packaging of both RPM and Deb packages, and you can experiment locally with
> this stuff using the vagrant recipes.
>
>
>
> On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Hey Jose
>>
>> Have you looked at Amazon emr ( elastic map reduce) where I work we have
>> used it and when you provision the emr instance you can use custom jars
>> like the one you mentioned.
>>
>> In terms of storage you can use either hdfs, if you are going to keep a
>> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>>
>> Documentation for emr is really good. At the time when we did this and
>> this was at the beginning of this year and they supported Hadoop 2.6.
>>
>> In my honest opinion you are giving yourself a lot of extra work for
>> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
>> there. I managed to tool up and learn how to work with emr in a week.
>>
>> Sent from my iPhone
>>
>> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
>> wrote:
>>
>> Thanks for your answer Anders.
>>
>> -The amount of data that i'm going to manipulate it's like the wikipedia
>> (i will use a dump)
>> - I already have the basics of hadoop (i hope), i have a local multinode
>> cluster setup and i already executed some algorithms.
>> - Because the amount of data its important, i believe that i should use
>> several nodes.
>>
>> Maybe another option to considerate should be that i'm running Giraph on
>> top of the selected hadoop distribution/EC2.
>>
>> Bye!
>> Jose
>>
>> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <
>> anders.shinde.nielsen@gmail.com>:
>>
>>> Dear Jose,
>>>
>>> It will help people answer your question if you specify your goals :
>>>
>>> -If you do it to learn how to USE a running Hadoop then go for one of
>>> the prebuilt distributions (Amazon or MapR)
>>> -If you do it to learn more about the setting up and administrating
>>> Hadoop then you are better off setting everything up from scratch on EC2.
>>> -Do you need to run on many nodes or just a 1 node to test some
>>> Mapreduce scripts on a small data set?
>>>
>>> Regards,
>>>
>>> Anders
>>>
>>>
>>>
>>>
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>>> larroquester@gmail.com> wrote:
>>>
>>>> Hi all !
>>>>
>>>> I started to use hadoop with aws, and a big question appears in front
>>>> of me!
>>>>
>>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>>> some trivial examples, and before moving forward i have one question.
>>>>
>>>> What is the better option for using Hadoop on AWS?
>>>> - Build it from scratch on a EC2 instance
>>>> - Use MapR distribution of Hadoop
>>>> - Use Amazon distribution of Hadoop
>>>>
>>>> Sorry if my question is too broad.
>>>>
>>>> Bye!
>>>> Jose
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answers!

@Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the
benefits of using it against building from scratch on a EC2 instance (i
found tutorials using all of this options alike)
@jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem
the right option, i'm trying to keep things simple, because i don't have
much experience with this tecnologies.

Any other answer will be welcome!

Bye!
Jose

2015-10-19 12:37 GMT-03:00 jay vyas <ja...@gmail.com>:

> Also, ASF BigTop packages hadoop for you.
>
> You can always grab our releases
> http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
>
> We package pig, spark, hive, hbase, ....
>
> Its not had to set up a bigtop build server, as we have dockerized the
> packaging of both RPM and Deb packages, and you can experiment locally with
> this stuff using the vagrant recipes.
>
>
>
> On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Hey Jose
>>
>> Have you looked at Amazon emr ( elastic map reduce) where I work we have
>> used it and when you provision the emr instance you can use custom jars
>> like the one you mentioned.
>>
>> In terms of storage you can use either hdfs, if you are going to keep a
>> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>>
>> Documentation for emr is really good. At the time when we did this and
>> this was at the beginning of this year and they supported Hadoop 2.6.
>>
>> In my honest opinion you are giving yourself a lot of extra work for
>> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
>> there. I managed to tool up and learn how to work with emr in a week.
>>
>> Sent from my iPhone
>>
>> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
>> wrote:
>>
>> Thanks for your answer Anders.
>>
>> -The amount of data that i'm going to manipulate it's like the wikipedia
>> (i will use a dump)
>> - I already have the basics of hadoop (i hope), i have a local multinode
>> cluster setup and i already executed some algorithms.
>> - Because the amount of data its important, i believe that i should use
>> several nodes.
>>
>> Maybe another option to considerate should be that i'm running Giraph on
>> top of the selected hadoop distribution/EC2.
>>
>> Bye!
>> Jose
>>
>> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <
>> anders.shinde.nielsen@gmail.com>:
>>
>>> Dear Jose,
>>>
>>> It will help people answer your question if you specify your goals :
>>>
>>> -If you do it to learn how to USE a running Hadoop then go for one of
>>> the prebuilt distributions (Amazon or MapR)
>>> -If you do it to learn more about the setting up and administrating
>>> Hadoop then you are better off setting everything up from scratch on EC2.
>>> -Do you need to run on many nodes or just a 1 node to test some
>>> Mapreduce scripts on a small data set?
>>>
>>> Regards,
>>>
>>> Anders
>>>
>>>
>>>
>>>
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>>> larroquester@gmail.com> wrote:
>>>
>>>> Hi all !
>>>>
>>>> I started to use hadoop with aws, and a big question appears in front
>>>> of me!
>>>>
>>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>>> some trivial examples, and before moving forward i have one question.
>>>>
>>>> What is the better option for using Hadoop on AWS?
>>>> - Build it from scratch on a EC2 instance
>>>> - Use MapR distribution of Hadoop
>>>> - Use Amazon distribution of Hadoop
>>>>
>>>> Sorry if my question is too broad.
>>>>
>>>> Bye!
>>>> Jose
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by jay vyas <ja...@gmail.com>.

Also, ASF BigTop packages hadoop for you.

You can always grab our releases
http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

We package pig, spark, hive, hbase, ....

Its not had to set up a bigtop build server, as we have dockerized the
packaging of both RPM and Deb packages, and you can experiment locally with
this stuff using the vagrant recipes.



On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

> Hey Jose
>
> Have you looked at Amazon emr ( elastic map reduce) where I work we have
> used it and when you provision the emr instance you can use custom jars
> like the one you mentioned.
>
> In terms of storage you can use either hdfs, if you are going to keep a
> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>
> Documentation for emr is really good. At the time when we did this and
> this was at the beginning of this year and they supported Hadoop 2.6.
>
> In my honest opinion you are giving yourself a lot of extra work for
> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
> there. I managed to tool up and learn how to work with emr in a week.
>
> Sent from my iPhone
>
> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
> wrote:
>
> Thanks for your answer Anders.
>
> -The amount of data that i'm going to manipulate it's like the wikipedia
> (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode
> cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use
> several nodes.
>
> Maybe another option to considerate should be that i'm running Giraph on
> top of the selected hadoop distribution/EC2.
>
> Bye!
> Jose
>
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com
> >:
>
>> Dear Jose,
>>
>> It will help people answer your question if you specify your goals :
>>
>> -If you do it to learn how to USE a running Hadoop then go for one of the
>> prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating
>> Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
>> scripts on a small data set?
>>
>> Regards,
>>
>> Anders
>>
>>
>>
>>
>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>> larroquester@gmail.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
jay vyas

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by jay vyas <ja...@gmail.com>.

Also, ASF BigTop packages hadoop for you.

You can always grab our releases
http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

We package pig, spark, hive, hbase, ....

Its not had to set up a bigtop build server, as we have dockerized the
packaging of both RPM and Deb packages, and you can experiment locally with
this stuff using the vagrant recipes.



On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

> Hey Jose
>
> Have you looked at Amazon emr ( elastic map reduce) where I work we have
> used it and when you provision the emr instance you can use custom jars
> like the one you mentioned.
>
> In terms of storage you can use either hdfs, if you are going to keep a
> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>
> Documentation for emr is really good. At the time when we did this and
> this was at the beginning of this year and they supported Hadoop 2.6.
>
> In my honest opinion you are giving yourself a lot of extra work for
> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
> there. I managed to tool up and learn how to work with emr in a week.
>
> Sent from my iPhone
>
> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
> wrote:
>
> Thanks for your answer Anders.
>
> -The amount of data that i'm going to manipulate it's like the wikipedia
> (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode
> cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use
> several nodes.
>
> Maybe another option to considerate should be that i'm running Giraph on
> top of the selected hadoop distribution/EC2.
>
> Bye!
> Jose
>
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com
> >:
>
>> Dear Jose,
>>
>> It will help people answer your question if you specify your goals :
>>
>> -If you do it to learn how to USE a running Hadoop then go for one of the
>> prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating
>> Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
>> scripts on a small data set?
>>
>> Regards,
>>
>> Anders
>>
>>
>>
>>
>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>> larroquester@gmail.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
jay vyas

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by jay vyas <ja...@gmail.com>.

Also, ASF BigTop packages hadoop for you.

You can always grab our releases
http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

We package pig, spark, hive, hbase, ....

Its not had to set up a bigtop build server, as we have dockerized the
packaging of both RPM and Deb packages, and you can experiment locally with
this stuff using the vagrant recipes.



On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

> Hey Jose
>
> Have you looked at Amazon emr ( elastic map reduce) where I work we have
> used it and when you provision the emr instance you can use custom jars
> like the one you mentioned.
>
> In terms of storage you can use either hdfs, if you are going to keep a
> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>
> Documentation for emr is really good. At the time when we did this and
> this was at the beginning of this year and they supported Hadoop 2.6.
>
> In my honest opinion you are giving yourself a lot of extra work for
> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
> there. I managed to tool up and learn how to work with emr in a week.
>
> Sent from my iPhone
>
> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
> wrote:
>
> Thanks for your answer Anders.
>
> -The amount of data that i'm going to manipulate it's like the wikipedia
> (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode
> cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use
> several nodes.
>
> Maybe another option to considerate should be that i'm running Giraph on
> top of the selected hadoop distribution/EC2.
>
> Bye!
> Jose
>
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com
> >:
>
>> Dear Jose,
>>
>> It will help people answer your question if you specify your goals :
>>
>> -If you do it to learn how to USE a running Hadoop then go for one of the
>> prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating
>> Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
>> scripts on a small data set?
>>
>> Regards,
>>
>> Anders
>>
>>
>>
>>
>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>> larroquester@gmail.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
jay vyas

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by jay vyas <ja...@gmail.com>.

Also, ASF BigTop packages hadoop for you.

You can always grab our releases
http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

We package pig, spark, hive, hbase, ....

Its not had to set up a bigtop build server, as we have dockerized the
packaging of both RPM and Deb packages, and you can experiment locally with
this stuff using the vagrant recipes.



On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <ja...@eagleeyet.net>
wrote:

> Hey Jose
>
> Have you looked at Amazon emr ( elastic map reduce) where I work we have
> used it and when you provision the emr instance you can use custom jars
> like the one you mentioned.
>
> In terms of storage you can use either hdfs, if you are going to keep a
> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>
> Documentation for emr is really good. At the time when we did this and
> this was at the beginning of this year and they supported Hadoop 2.6.
>
> In my honest opinion you are giving yourself a lot of extra work for
> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
> there. I managed to tool up and learn how to work with emr in a week.
>
> Sent from my iPhone
>
> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com>
> wrote:
>
> Thanks for your answer Anders.
>
> -The amount of data that i'm going to manipulate it's like the wikipedia
> (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode
> cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use
> several nodes.
>
> Maybe another option to considerate should be that i'm running Giraph on
> top of the selected hadoop distribution/EC2.
>
> Bye!
> Jose
>
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <anders.shinde.nielsen@gmail.com
> >:
>
>> Dear Jose,
>>
>> It will help people answer your question if you specify your goals :
>>
>> -If you do it to learn how to USE a running Hadoop then go for one of the
>> prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating
>> Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
>> scripts on a small data set?
>>
>> Regards,
>>
>> Anders
>>
>>
>>
>>
>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>> larroquester@gmail.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>


-- 
jay vyas

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Hey Jose

Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when you provision the emr instance you can use custom jars like the one you mentioned. 

In terms of storage you can use either hdfs, if you are going to keep a persistent cluster. If not you can store your data in an Amazon s3 bucket. 

Documentation for emr is really good. At the time when we did this and this was at the beginning of this year and they supported Hadoop 2.6. 

In my honest opinion you are giving yourself a lot of extra work for nothing to get us in Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn how to work with emr in a week.

Sent from my iPhone

> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com> wrote:
> 
> Thanks for your answer Anders.
> 
> -The amount of data that i'm going to manipulate it's like the wikipedia (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use several nodes.
> 
> Maybe another option to considerate should be that i'm running Giraph on top of the selected hadoop distribution/EC2.
> 
> Bye!
> Jose
> 
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:
>> Dear Jose, 
>> 
>> It will help people answer your question if you specify your goals :
>> 
>> -If you do it to learn how to USE a running Hadoop then go for one of the prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce scripts on a small data set?
>> 
>> Regards, 
>> 
>> Anders
>> 
>> 
>> 
>> 
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <la...@gmail.com> wrote:
>>> Hi all !
>>> 
>>> I started to use hadoop with aws, and a big question appears in front of me!
>>> 
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried some trivial examples, and before moving forward i have one question.
>>> 
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance 
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>> 
>>> Sorry if my question is too broad.
>>> 
>>> Bye!
>>> Jose
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Hey Jose

Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when you provision the emr instance you can use custom jars like the one you mentioned. 

In terms of storage you can use either hdfs, if you are going to keep a persistent cluster. If not you can store your data in an Amazon s3 bucket. 

Documentation for emr is really good. At the time when we did this and this was at the beginning of this year and they supported Hadoop 2.6. 

In my honest opinion you are giving yourself a lot of extra work for nothing to get us in Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn how to work with emr in a week.

Sent from my iPhone

> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com> wrote:
> 
> Thanks for your answer Anders.
> 
> -The amount of data that i'm going to manipulate it's like the wikipedia (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use several nodes.
> 
> Maybe another option to considerate should be that i'm running Giraph on top of the selected hadoop distribution/EC2.
> 
> Bye!
> Jose
> 
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:
>> Dear Jose, 
>> 
>> It will help people answer your question if you specify your goals :
>> 
>> -If you do it to learn how to USE a running Hadoop then go for one of the prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce scripts on a small data set?
>> 
>> Regards, 
>> 
>> Anders
>> 
>> 
>> 
>> 
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <la...@gmail.com> wrote:
>>> Hi all !
>>> 
>>> I started to use hadoop with aws, and a big question appears in front of me!
>>> 
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried some trivial examples, and before moving forward i have one question.
>>> 
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance 
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>> 
>>> Sorry if my question is too broad.
>>> 
>>> Bye!
>>> Jose
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Hey Jose

Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when you provision the emr instance you can use custom jars like the one you mentioned. 

In terms of storage you can use either hdfs, if you are going to keep a persistent cluster. If not you can store your data in an Amazon s3 bucket. 

Documentation for emr is really good. At the time when we did this and this was at the beginning of this year and they supported Hadoop 2.6. 

In my honest opinion you are giving yourself a lot of extra work for nothing to get us in Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn how to work with emr in a week.

Sent from my iPhone

> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com> wrote:
> 
> Thanks for your answer Anders.
> 
> -The amount of data that i'm going to manipulate it's like the wikipedia (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use several nodes.
> 
> Maybe another option to considerate should be that i'm running Giraph on top of the selected hadoop distribution/EC2.
> 
> Bye!
> Jose
> 
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:
>> Dear Jose, 
>> 
>> It will help people answer your question if you specify your goals :
>> 
>> -If you do it to learn how to USE a running Hadoop then go for one of the prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce scripts on a small data set?
>> 
>> Regards, 
>> 
>> Anders
>> 
>> 
>> 
>> 
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <la...@gmail.com> wrote:
>>> Hi all !
>>> 
>>> I started to use hadoop with aws, and a big question appears in front of me!
>>> 
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried some trivial examples, and before moving forward i have one question.
>>> 
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance 
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>> 
>>> Sorry if my question is too broad.
>>> 
>>> Bye!
>>> Jose
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

Hey Jose

Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when you provision the emr instance you can use custom jars like the one you mentioned. 

In terms of storage you can use either hdfs, if you are going to keep a persistent cluster. If not you can store your data in an Amazon s3 bucket. 

Documentation for emr is really good. At the time when we did this and this was at the beginning of this year and they supported Hadoop 2.6. 

In my honest opinion you are giving yourself a lot of extra work for nothing to get us in Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn how to work with emr in a week.

Sent from my iPhone

> On 19 Oct 2015, at 02:10, José Luis Larroque <la...@gmail.com> wrote:
> 
> Thanks for your answer Anders.
> 
> -The amount of data that i'm going to manipulate it's like the wikipedia (i will use a dump)
> - I already have the basics of hadoop (i hope), i have a local multinode cluster setup and i already executed some algorithms.
> - Because the amount of data its important, i believe that i should use several nodes.
> 
> Maybe another option to considerate should be that i'm running Giraph on top of the selected hadoop distribution/EC2.
> 
> Bye!
> Jose
> 
> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:
>> Dear Jose, 
>> 
>> It will help people answer your question if you specify your goals :
>> 
>> -If you do it to learn how to USE a running Hadoop then go for one of the prebuilt distributions (Amazon or MapR)
>> -If you do it to learn more about the setting up and administrating Hadoop then you are better off setting everything up from scratch on EC2.
>> -Do you need to run on many nodes or just a 1 node to test some Mapreduce scripts on a small data set?
>> 
>> Regards, 
>> 
>> Anders
>> 
>> 
>> 
>> 
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <la...@gmail.com> wrote:
>>> Hi all !
>>> 
>>> I started to use hadoop with aws, and a big question appears in front of me!
>>> 
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried some trivial examples, and before moving forward i have one question.
>>> 
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance 
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>> 
>>> Sorry if my question is too broad.
>>> 
>>> Bye!
>>> Jose
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answer Anders.

-The amount of data that i'm going to manipulate it's like the wikipedia (i
will use a dump)
- I already have the basics of hadoop (i hope), i have a local multinode
cluster setup and i already executed some algorithms.
- Because the amount of data its important, i believe that i should use
several nodes.

Maybe another option to considerate should be that i'm running Giraph on
top of the selected hadoop distribution/EC2.

Bye!
Jose

2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:

> Dear Jose,
>
> It will help people answer your question if you specify your goals :
>
> -If you do it to learn how to USE a running Hadoop then go for one of the
> prebuilt distributions (Amazon or MapR)
> -If you do it to learn more about the setting up and administrating Hadoop
> then you are better off setting everything up from scratch on EC2.
> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
> scripts on a small data set?
>
> Regards,
>
> Anders
>
>
>
>
> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
> larroquester@gmail.com> wrote:
>
>> Hi all !
>>
>> I started to use hadoop with aws, and a big question appears in front of
>> me!
>>
>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>> some trivial examples, and before moving forward i have one question.
>>
>> What is the better option for using Hadoop on AWS?
>> - Build it from scratch on a EC2 instance
>> - Use MapR distribution of Hadoop
>> - Use Amazon distribution of Hadoop
>>
>> Sorry if my question is too broad.
>>
>> Bye!
>> Jose
>>
>>
>>
>>
>>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answer Anders.

-The amount of data that i'm going to manipulate it's like the wikipedia (i
will use a dump)
- I already have the basics of hadoop (i hope), i have a local multinode
cluster setup and i already executed some algorithms.
- Because the amount of data its important, i believe that i should use
several nodes.

Maybe another option to considerate should be that i'm running Giraph on
top of the selected hadoop distribution/EC2.

Bye!
Jose

2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:

> Dear Jose,
>
> It will help people answer your question if you specify your goals :
>
> -If you do it to learn how to USE a running Hadoop then go for one of the
> prebuilt distributions (Amazon or MapR)
> -If you do it to learn more about the setting up and administrating Hadoop
> then you are better off setting everything up from scratch on EC2.
> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
> scripts on a small data set?
>
> Regards,
>
> Anders
>
>
>
>
> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
> larroquester@gmail.com> wrote:
>
>> Hi all !
>>
>> I started to use hadoop with aws, and a big question appears in front of
>> me!
>>
>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>> some trivial examples, and before moving forward i have one question.
>>
>> What is the better option for using Hadoop on AWS?
>> - Build it from scratch on a EC2 instance
>> - Use MapR distribution of Hadoop
>> - Use Amazon distribution of Hadoop
>>
>> Sorry if my question is too broad.
>>
>> Bye!
>> Jose
>>
>>
>>
>>
>>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answer Anders.

-The amount of data that i'm going to manipulate it's like the wikipedia (i
will use a dump)
- I already have the basics of hadoop (i hope), i have a local multinode
cluster setup and i already executed some algorithms.
- Because the amount of data its important, i believe that i should use
several nodes.

Maybe another option to considerate should be that i'm running Giraph on
top of the selected hadoop distribution/EC2.

Bye!
Jose

2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:

> Dear Jose,
>
> It will help people answer your question if you specify your goals :
>
> -If you do it to learn how to USE a running Hadoop then go for one of the
> prebuilt distributions (Amazon or MapR)
> -If you do it to learn more about the setting up and administrating Hadoop
> then you are better off setting everything up from scratch on EC2.
> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
> scripts on a small data set?
>
> Regards,
>
> Anders
>
>
>
>
> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
> larroquester@gmail.com> wrote:
>
>> Hi all !
>>
>> I started to use hadoop with aws, and a big question appears in front of
>> me!
>>
>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>> some trivial examples, and before moving forward i have one question.
>>
>> What is the better option for using Hadoop on AWS?
>> - Build it from scratch on a EC2 instance
>> - Use MapR distribution of Hadoop
>> - Use Amazon distribution of Hadoop
>>
>> Sorry if my question is too broad.
>>
>> Bye!
>> Jose
>>
>>
>>
>>
>>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by José Luis Larroque <la...@gmail.com>.

Thanks for your answer Anders.

-The amount of data that i'm going to manipulate it's like the wikipedia (i
will use a dump)
- I already have the basics of hadoop (i hope), i have a local multinode
cluster setup and i already executed some algorithms.
- Because the amount of data its important, i believe that i should use
several nodes.

Maybe another option to considerate should be that i'm running Giraph on
top of the selected hadoop distribution/EC2.

Bye!
Jose

2015-10-18 18:53 GMT-03:00 Anders Nielsen <an...@gmail.com>:

> Dear Jose,
>
> It will help people answer your question if you specify your goals :
>
> -If you do it to learn how to USE a running Hadoop then go for one of the
> prebuilt distributions (Amazon or MapR)
> -If you do it to learn more about the setting up and administrating Hadoop
> then you are better off setting everything up from scratch on EC2.
> -Do you need to run on many nodes or just a 1 node to test some Mapreduce
> scripts on a small data set?
>
> Regards,
>
> Anders
>
>
>
>
> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
> larroquester@gmail.com> wrote:
>
>> Hi all !
>>
>> I started to use hadoop with aws, and a big question appears in front of
>> me!
>>
>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>> some trivial examples, and before moving forward i have one question.
>>
>> What is the better option for using Hadoop on AWS?
>> - Build it from scratch on a EC2 instance
>> - Use MapR distribution of Hadoop
>> - Use Amazon distribution of Hadoop
>>
>> Sorry if my question is too broad.
>>
>> Bye!
>> Jose
>>
>>
>>
>>
>>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Anders Nielsen <an...@gmail.com>.

Dear Jose,

It will help people answer your question if you specify your goals :

-If you do it to learn how to USE a running Hadoop then go for one of the
prebuilt distributions (Amazon or MapR)
-If you do it to learn more about the setting up and administrating Hadoop
then you are better off setting everything up from scratch on EC2.
-Do you need to run on many nodes or just a 1 node to test some Mapreduce
scripts on a small data set?

Regards,

Anders




On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <larroquester@gmail.com
> wrote:

> Hi all !
>
> I started to use hadoop with aws, and a big question appears in front of
> me!
>
> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
> some trivial examples, and before moving forward i have one question.
>
> What is the better option for using Hadoop on AWS?
> - Build it from scratch on a EC2 instance
> - Use MapR distribution of Hadoop
> - Use Amazon distribution of Hadoop
>
> Sorry if my question is too broad.
>
> Bye!
> Jose
>
>
>
>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Anders Nielsen <an...@gmail.com>.

Dear Jose,

It will help people answer your question if you specify your goals :

-If you do it to learn how to USE a running Hadoop then go for one of the
prebuilt distributions (Amazon or MapR)
-If you do it to learn more about the setting up and administrating Hadoop
then you are better off setting everything up from scratch on EC2.
-Do you need to run on many nodes or just a 1 node to test some Mapreduce
scripts on a small data set?

Regards,

Anders




On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <larroquester@gmail.com
> wrote:

> Hi all !
>
> I started to use hadoop with aws, and a big question appears in front of
> me!
>
> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
> some trivial examples, and before moving forward i have one question.
>
> What is the better option for using Hadoop on AWS?
> - Build it from scratch on a EC2 instance
> - Use MapR distribution of Hadoop
> - Use Amazon distribution of Hadoop
>
> Sorry if my question is too broad.
>
> Bye!
> Jose
>
>
>
>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Anders Nielsen <an...@gmail.com>.

Dear Jose,

It will help people answer your question if you specify your goals :

-If you do it to learn how to USE a running Hadoop then go for one of the
prebuilt distributions (Amazon or MapR)
-If you do it to learn more about the setting up and administrating Hadoop
then you are better off setting everything up from scratch on EC2.
-Do you need to run on many nodes or just a 1 node to test some Mapreduce
scripts on a small data set?

Regards,

Anders




On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <larroquester@gmail.com
> wrote:

> Hi all !
>
> I started to use hadoop with aws, and a big question appears in front of
> me!
>
> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
> some trivial examples, and before moving forward i have one question.
>
> What is the better option for using Hadoop on AWS?
> - Build it from scratch on a EC2 instance
> - Use MapR distribution of Hadoop
> - Use Amazon distribution of Hadoop
>
> Sorry if my question is too broad.
>
> Bye!
> Jose
>
>
>
>
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Posted by Anders Nielsen <an...@gmail.com>.

Dear Jose,

It will help people answer your question if you specify your goals :

-If you do it to learn how to USE a running Hadoop then go for one of the
prebuilt distributions (Amazon or MapR)
-If you do it to learn more about the setting up and administrating Hadoop
then you are better off setting everything up from scratch on EC2.
-Do you need to run on many nodes or just a 1 node to test some Mapreduce
scripts on a small data set?

Regards,

Anders




On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <larroquester@gmail.com
> wrote:

> Hi all !
>
> I started to use hadoop with aws, and a big question appears in front of
> me!
>
> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
> some trivial examples, and before moving forward i have one question.
>
> What is the better option for using Hadoop on AWS?
> - Build it from scratch on a EC2 instance
> - Use MapR distribution of Hadoop
> - Use Amazon distribution of Hadoop
>
> Sorry if my question is too broad.
>
> Bye!
> Jose
>
>
>
>
>