You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Krish Donald <go...@gmail.com> on 2015/03/07 07:11:56 UTC

What skills to Learn to become Hadoop Admin

Hi,

I would like to enter into Big Data world as Hadoop Admin and I have setup
7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
I have installed the services like hive, oozie, zookeeper etc.

I have done a web log integration using flume and twitter sentiment
analysis.

I wanted to understand what are the other skills I should learn ?

Thanks
Krish

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Hi Jay,

Is there a blog or anything that talks about setting up this big pet store
application?  as i looked at the GIT readme file and was a little bit
lost.  Maybe thats becuase i am new to Hadoop.

On Sat, Mar 7, 2015 at 10:34 AM, jay vyas <ja...@gmail.com>
wrote:

> Setting up vendor distros is a great first step.
>
> 1) Running TeraSort and benchmarking is a good step.  You can also run
> larger, full stack hadoop applications like bigpetstore, which we curate
> here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.
>
> 2) Write some mapreduce or spark jobs which write data to a persistent
> transactional store, such as SOLR or HBase.  This is a hugely important
> part of real world hadoop administration, where you will encounter problems
> like running out of memory, possibly CPU overclocking on some nodes, and so
> on.
>
> 3) Now, did you want to go deeper into the build/setup/deployment of
> hadoop ?  Its worth it  to try building/deploying/debugging hadoop ecosytem
> components from scratch, by setting up Apache BigTop, which packages
> RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
> original roots of both the cloudera and hortonworks distributions, so you
> will learn something about both by playing with it.
>
> We have some exersizes you can use to guide you and get started
> https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes
> .  Feel free to join the mailing list for questions.
>
>
>
>
> On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:
>
>> Krish,
>>
>> I dont mean to hijack your mail here but i wanted to find out how/what
>> you did for the below portion, as i am trying to go down your path as well,
>> i was able to get 4-5 node cluster using ambari and cdh and now wanted to
>> take it to next level.  What have you done for below?
>>
>> "I have done a web log integration using flume and twitter sentiment
>> analysis."
>>
>> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to enter into Big Data world as Hadoop Admin and I have
>>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>>> I have installed the services like hive, oozie, zookeeper etc.
>>>
>>> I have done a web log integration using flume and twitter sentiment
>>> analysis.
>>>
>>> I wanted to understand what are the other skills I should learn ?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>
>
> --
> jay vyas
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Hi Jay,

Is there a blog or anything that talks about setting up this big pet store
application?  as i looked at the GIT readme file and was a little bit
lost.  Maybe thats becuase i am new to Hadoop.

On Sat, Mar 7, 2015 at 10:34 AM, jay vyas <ja...@gmail.com>
wrote:

> Setting up vendor distros is a great first step.
>
> 1) Running TeraSort and benchmarking is a good step.  You can also run
> larger, full stack hadoop applications like bigpetstore, which we curate
> here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.
>
> 2) Write some mapreduce or spark jobs which write data to a persistent
> transactional store, such as SOLR or HBase.  This is a hugely important
> part of real world hadoop administration, where you will encounter problems
> like running out of memory, possibly CPU overclocking on some nodes, and so
> on.
>
> 3) Now, did you want to go deeper into the build/setup/deployment of
> hadoop ?  Its worth it  to try building/deploying/debugging hadoop ecosytem
> components from scratch, by setting up Apache BigTop, which packages
> RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
> original roots of both the cloudera and hortonworks distributions, so you
> will learn something about both by playing with it.
>
> We have some exersizes you can use to guide you and get started
> https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes
> .  Feel free to join the mailing list for questions.
>
>
>
>
> On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:
>
>> Krish,
>>
>> I dont mean to hijack your mail here but i wanted to find out how/what
>> you did for the below portion, as i am trying to go down your path as well,
>> i was able to get 4-5 node cluster using ambari and cdh and now wanted to
>> take it to next level.  What have you done for below?
>>
>> "I have done a web log integration using flume and twitter sentiment
>> analysis."
>>
>> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to enter into Big Data world as Hadoop Admin and I have
>>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>>> I have installed the services like hive, oozie, zookeeper etc.
>>>
>>> I have done a web log integration using flume and twitter sentiment
>>> analysis.
>>>
>>> I wanted to understand what are the other skills I should learn ?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>
>
> --
> jay vyas
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Hi Jay,

Is there a blog or anything that talks about setting up this big pet store
application?  as i looked at the GIT readme file and was a little bit
lost.  Maybe thats becuase i am new to Hadoop.

On Sat, Mar 7, 2015 at 10:34 AM, jay vyas <ja...@gmail.com>
wrote:

> Setting up vendor distros is a great first step.
>
> 1) Running TeraSort and benchmarking is a good step.  You can also run
> larger, full stack hadoop applications like bigpetstore, which we curate
> here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.
>
> 2) Write some mapreduce or spark jobs which write data to a persistent
> transactional store, such as SOLR or HBase.  This is a hugely important
> part of real world hadoop administration, where you will encounter problems
> like running out of memory, possibly CPU overclocking on some nodes, and so
> on.
>
> 3) Now, did you want to go deeper into the build/setup/deployment of
> hadoop ?  Its worth it  to try building/deploying/debugging hadoop ecosytem
> components from scratch, by setting up Apache BigTop, which packages
> RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
> original roots of both the cloudera and hortonworks distributions, so you
> will learn something about both by playing with it.
>
> We have some exersizes you can use to guide you and get started
> https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes
> .  Feel free to join the mailing list for questions.
>
>
>
>
> On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:
>
>> Krish,
>>
>> I dont mean to hijack your mail here but i wanted to find out how/what
>> you did for the below portion, as i am trying to go down your path as well,
>> i was able to get 4-5 node cluster using ambari and cdh and now wanted to
>> take it to next level.  What have you done for below?
>>
>> "I have done a web log integration using flume and twitter sentiment
>> analysis."
>>
>> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to enter into Big Data world as Hadoop Admin and I have
>>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>>> I have installed the services like hive, oozie, zookeeper etc.
>>>
>>> I have done a web log integration using flume and twitter sentiment
>>> analysis.
>>>
>>> I wanted to understand what are the other skills I should learn ?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>
>
> --
> jay vyas
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Hi Jay,

Is there a blog or anything that talks about setting up this big pet store
application?  as i looked at the GIT readme file and was a little bit
lost.  Maybe thats becuase i am new to Hadoop.

On Sat, Mar 7, 2015 at 10:34 AM, jay vyas <ja...@gmail.com>
wrote:

> Setting up vendor distros is a great first step.
>
> 1) Running TeraSort and benchmarking is a good step.  You can also run
> larger, full stack hadoop applications like bigpetstore, which we curate
> here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.
>
> 2) Write some mapreduce or spark jobs which write data to a persistent
> transactional store, such as SOLR or HBase.  This is a hugely important
> part of real world hadoop administration, where you will encounter problems
> like running out of memory, possibly CPU overclocking on some nodes, and so
> on.
>
> 3) Now, did you want to go deeper into the build/setup/deployment of
> hadoop ?  Its worth it  to try building/deploying/debugging hadoop ecosytem
> components from scratch, by setting up Apache BigTop, which packages
> RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
> original roots of both the cloudera and hortonworks distributions, so you
> will learn something about both by playing with it.
>
> We have some exersizes you can use to guide you and get started
> https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes
> .  Feel free to join the mailing list for questions.
>
>
>
>
> On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:
>
>> Krish,
>>
>> I dont mean to hijack your mail here but i wanted to find out how/what
>> you did for the below portion, as i am trying to go down your path as well,
>> i was able to get 4-5 node cluster using ambari and cdh and now wanted to
>> take it to next level.  What have you done for below?
>>
>> "I have done a web log integration using flume and twitter sentiment
>> analysis."
>>
>> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to enter into Big Data world as Hadoop Admin and I have
>>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>>> I have installed the services like hive, oozie, zookeeper etc.
>>>
>>> I have done a web log integration using flume and twitter sentiment
>>> analysis.
>>>
>>> I wanted to understand what are the other skills I should learn ?
>>>
>>> Thanks
>>> Krish
>>>
>>
>>
>
>
> --
> jay vyas
>

Re: What skills to Learn to become Hadoop Admin

Posted by jay vyas <ja...@gmail.com>.
Setting up vendor distros is a great first step.

1) Running TeraSort and benchmarking is a good step.  You can also run
larger, full stack hadoop applications like bigpetstore, which we curate
here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.

2) Write some mapreduce or spark jobs which write data to a persistent
transactional store, such as SOLR or HBase.  This is a hugely important
part of real world hadoop administration, where you will encounter problems
like running out of memory, possibly CPU overclocking on some nodes, and so
on.

3) Now, did you want to go deeper into the build/setup/deployment of hadoop
?  Its worth it  to try building/deploying/debugging hadoop ecosytem
components from scratch, by setting up Apache BigTop, which packages
RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
original roots of both the cloudera and hortonworks distributions, so you
will learn something about both by playing with it.

We have some exersizes you can use to guide you and get started
https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes .
Feel free to join the mailing list for questions.




On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:

> Krish,
>
> I dont mean to hijack your mail here but i wanted to find out how/what you
> did for the below portion, as i am trying to go down your path as well, i
> was able to get 4-5 node cluster using ambari and cdh and now wanted to
> take it to next level.  What have you done for below?
>
> "I have done a web log integration using flume and twitter sentiment
> analysis."
>
> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I would like to enter into Big Data world as Hadoop Admin and I have
>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>> I have installed the services like hive, oozie, zookeeper etc.
>>
>> I have done a web log integration using flume and twitter sentiment
>> analysis.
>>
>> I wanted to understand what are the other skills I should learn ?
>>
>> Thanks
>> Krish
>>
>
>


-- 
jay vyas

Re: What skills to Learn to become Hadoop Admin

Posted by jay vyas <ja...@gmail.com>.
Setting up vendor distros is a great first step.

1) Running TeraSort and benchmarking is a good step.  You can also run
larger, full stack hadoop applications like bigpetstore, which we curate
here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.

2) Write some mapreduce or spark jobs which write data to a persistent
transactional store, such as SOLR or HBase.  This is a hugely important
part of real world hadoop administration, where you will encounter problems
like running out of memory, possibly CPU overclocking on some nodes, and so
on.

3) Now, did you want to go deeper into the build/setup/deployment of hadoop
?  Its worth it  to try building/deploying/debugging hadoop ecosytem
components from scratch, by setting up Apache BigTop, which packages
RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
original roots of both the cloudera and hortonworks distributions, so you
will learn something about both by playing with it.

We have some exersizes you can use to guide you and get started
https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes .
Feel free to join the mailing list for questions.




On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:

> Krish,
>
> I dont mean to hijack your mail here but i wanted to find out how/what you
> did for the below portion, as i am trying to go down your path as well, i
> was able to get 4-5 node cluster using ambari and cdh and now wanted to
> take it to next level.  What have you done for below?
>
> "I have done a web log integration using flume and twitter sentiment
> analysis."
>
> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I would like to enter into Big Data world as Hadoop Admin and I have
>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>> I have installed the services like hive, oozie, zookeeper etc.
>>
>> I have done a web log integration using flume and twitter sentiment
>> analysis.
>>
>> I wanted to understand what are the other skills I should learn ?
>>
>> Thanks
>> Krish
>>
>
>


-- 
jay vyas

Re: What skills to Learn to become Hadoop Admin

Posted by jay vyas <ja...@gmail.com>.
Setting up vendor distros is a great first step.

1) Running TeraSort and benchmarking is a good step.  You can also run
larger, full stack hadoop applications like bigpetstore, which we curate
here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.

2) Write some mapreduce or spark jobs which write data to a persistent
transactional store, such as SOLR or HBase.  This is a hugely important
part of real world hadoop administration, where you will encounter problems
like running out of memory, possibly CPU overclocking on some nodes, and so
on.

3) Now, did you want to go deeper into the build/setup/deployment of hadoop
?  Its worth it  to try building/deploying/debugging hadoop ecosytem
components from scratch, by setting up Apache BigTop, which packages
RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
original roots of both the cloudera and hortonworks distributions, so you
will learn something about both by playing with it.

We have some exersizes you can use to guide you and get started
https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes .
Feel free to join the mailing list for questions.




On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:

> Krish,
>
> I dont mean to hijack your mail here but i wanted to find out how/what you
> did for the below portion, as i am trying to go down your path as well, i
> was able to get 4-5 node cluster using ambari and cdh and now wanted to
> take it to next level.  What have you done for below?
>
> "I have done a web log integration using flume and twitter sentiment
> analysis."
>
> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I would like to enter into Big Data world as Hadoop Admin and I have
>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>> I have installed the services like hive, oozie, zookeeper etc.
>>
>> I have done a web log integration using flume and twitter sentiment
>> analysis.
>>
>> I wanted to understand what are the other skills I should learn ?
>>
>> Thanks
>> Krish
>>
>
>


-- 
jay vyas

Re: What skills to Learn to become Hadoop Admin

Posted by jay vyas <ja...@gmail.com>.
Setting up vendor distros is a great first step.

1) Running TeraSort and benchmarking is a good step.  You can also run
larger, full stack hadoop applications like bigpetstore, which we curate
here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.

2) Write some mapreduce or spark jobs which write data to a persistent
transactional store, such as SOLR or HBase.  This is a hugely important
part of real world hadoop administration, where you will encounter problems
like running out of memory, possibly CPU overclocking on some nodes, and so
on.

3) Now, did you want to go deeper into the build/setup/deployment of hadoop
?  Its worth it  to try building/deploying/debugging hadoop ecosytem
components from scratch, by setting up Apache BigTop, which packages
RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
original roots of both the cloudera and hortonworks distributions, so you
will learn something about both by playing with it.

We have some exersizes you can use to guide you and get started
https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes .
Feel free to join the mailing list for questions.




On Sat, Mar 7, 2015 at 9:32 AM, max scalf <or...@gmail.com> wrote:

> Krish,
>
> I dont mean to hijack your mail here but i wanted to find out how/what you
> did for the below portion, as i am trying to go down your path as well, i
> was able to get 4-5 node cluster using ambari and cdh and now wanted to
> take it to next level.  What have you done for below?
>
> "I have done a web log integration using flume and twitter sentiment
> analysis."
>
> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I would like to enter into Big Data world as Hadoop Admin and I have
>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>> I have installed the services like hive, oozie, zookeeper etc.
>>
>> I have done a web log integration using flume and twitter sentiment
>> analysis.
>>
>> I wanted to understand what are the other skills I should learn ?
>>
>> Thanks
>> Krish
>>
>
>


-- 
jay vyas

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Krish,

I dont mean to hijack your mail here but i wanted to find out how/what you
did for the below portion, as i am trying to go down your path as well, i
was able to get 4-5 node cluster using ambari and cdh and now wanted to
take it to next level.  What have you done for below?

"I have done a web log integration using flume and twitter sentiment
analysis."

On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> I would like to enter into Big Data world as Hadoop Admin and I have setup
> 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
> I have installed the services like hive, oozie, zookeeper etc.
>
> I have done a web log integration using flume and twitter sentiment
> analysis.
>
> I wanted to understand what are the other skills I should learn ?
>
> Thanks
> Krish
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Krish,

I dont mean to hijack your mail here but i wanted to find out how/what you
did for the below portion, as i am trying to go down your path as well, i
was able to get 4-5 node cluster using ambari and cdh and now wanted to
take it to next level.  What have you done for below?

"I have done a web log integration using flume and twitter sentiment
analysis."

On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> I would like to enter into Big Data world as Hadoop Admin and I have setup
> 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
> I have installed the services like hive, oozie, zookeeper etc.
>
> I have done a web log integration using flume and twitter sentiment
> analysis.
>
> I wanted to understand what are the other skills I should learn ?
>
> Thanks
> Krish
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Krish,

I dont mean to hijack your mail here but i wanted to find out how/what you
did for the below portion, as i am trying to go down your path as well, i
was able to get 4-5 node cluster using ambari and cdh and now wanted to
take it to next level.  What have you done for below?

"I have done a web log integration using flume and twitter sentiment
analysis."

On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> I would like to enter into Big Data world as Hadoop Admin and I have setup
> 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
> I have installed the services like hive, oozie, zookeeper etc.
>
> I have done a web log integration using flume and twitter sentiment
> analysis.
>
> I wanted to understand what are the other skills I should learn ?
>
> Thanks
> Krish
>

Re: What skills to Learn to become Hadoop Admin

Posted by max scalf <or...@gmail.com>.
Krish,

I dont mean to hijack your mail here but i wanted to find out how/what you
did for the below portion, as i am trying to go down your path as well, i
was able to get 4-5 node cluster using ambari and cdh and now wanted to
take it to next level.  What have you done for below?

"I have done a web log integration using flume and twitter sentiment
analysis."

On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> I would like to enter into Big Data world as Hadoop Admin and I have setup
> 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
> I have installed the services like hive, oozie, zookeeper etc.
>
> I have done a web log integration using flume and twitter sentiment
> analysis.
>
> I wanted to understand what are the other skills I should learn ?
>
> Thanks
> Krish
>