You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Mohammad Mustaqeem <3m...@gmail.com> on 2013/04/17 11:20:56 UTC

How to understand Hadoop source code ?

Hello everyone,
          I am new to this group. Since the source code of Hadoop is very
big, I am not able to understand it entirely.
Is there any document that describes the code?
Is there any way to understand the functionality of each classes and its
method?


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad

Re: How to understand Hadoop source code ?

Posted by Ronnie Ghose <ro...@gmail.com>.
+1 I'm one of those new people :)


On Thu, Apr 18, 2013 at 1:32 PM, Noelle Jakusz (c) <nj...@vmware.com>wrote:

> +1
>
> There are quite a few new people, so maybe start a collaborative group
> where you can collect notes and steps (videos and articles). I know I would
> have some for you that I have created as I have gotten started... it would
> be a great idea to post them after some collaboration and review.
>
> Thanks Chris for the detailed reply...
>
> -----Original Message-----
> From: Chris Nauroth [mailto:cnauroth@hortonworks.com]
> Sent: Thursday, April 18, 2013 1:14 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: How to understand Hadoop source code ?
>
> Is there a specific bug fix or feature that you are trying to contribute?
>  Specific questions like "how can I help with jira X?" or "what is the
> main entry point when I run the hdfs command?" or "where does the namenode
> serialize metadata to disk" or "where does the secondary namenode execute a
> checkpoint" can help focus the conversation.
>
> AFAIK, we don't have a general code walkthrough document focused on
> onboarding new engineers.  This could be a valuable contribution if you
> want to gather notes while you learn.  I think this always works best if
> it's driven by a new engineer with review by an expert.  (If the experts
> write it, then they might accidentally skip something non-obvious that
> they've already internalized.)
>
> Since that document doesn't exist yet, the other option is to do some
> reading of the code, ideally while trying to fix a specific bug that has
> been filed in jira.  Like you said, it's a relatively large codebase, so
> it's impractical to read the whole thing top-to-bottom.  Instead, it's
> important to look for high-level clues that steer you towards the right
> files.  I've found that the Maven module structure and the Java package
> names are usually descriptive enough to steer me in the right direction.
>  If you focus on getting familiar with those, you'll basically build a
> btree inside your brain that helps you index into the right part of the
> codebase and answer your own questions rapidly.  Several examples:
>
> "Where is the main entry point for the datanode daemon?": module
> hadoop-hdfs, package org.apache.hadoop.hdfs.server.datanode
>
> "What is the algorithm for rebalancing an unbalanced cluster?": module
> hadoop-hdfs, package org.apache.hadoop.hdfs.server.balancer
>
> "How does YARN launch a new container process?": module
> hadoop-yarn-server-nodemanager, package
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher
>
> "Multiple daemons publish JMX metrics as a common concern.  Where is that
> implemented?": module hadoop-common, package org.apache.hadoop.metrics2
>
> I hope this is helpful to get the process started for you.  We're always
> here to help if you have specific follow-up questions.
>
> Thanks,
> --Chris
>
>
> On Wed, Apr 17, 2013 at 10:33 PM, Prabakaran Krishnan <
> prabakaran_j2ee@yahoo.in> wrote:
>
> > Couuld you please help me in understand map reduce in Hadoop?
> >
> >
> >
> > ________________________________
> > From: Mohammad Mustaqeem <3m...@gmail.com>
> > To: common-dev <co...@hadoop.apache.org>
> > Sent: Thursday, 18 April 2013 10:44 AM
> > Subject: Re: How to understand Hadoop source code ?
> >
> >
> > I am interested in HDFS. Please guide me.
> >
> >
> > On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <ac...@hortonworks.com>
> > wrote:
> >
> > > Please don't cross post.
> > >
> > > What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
> > >
> > > Arun
> > >
> > > On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
> > >
> > > > Hello everyone,
> > > >          I am new to this group. Since the source code of Hadoop
> > > > is
> > very
> > > > big, I am not able to understand it entirely.
> > > > Is there any document that describes the code?
> > > > Is there any way to understand the functionality of each classes
> > > > and
> > its
> > > > method?
> > > >
> > > >
> > > > --
> > > > *With regards ---*
> > > > *Mohammad Mustaqeem*,
> > > > M.Tech (CSE)
> > > > MNNIT Allahabad
> > >
> > > --
> > > Arun C. Murthy
> > > Hortonworks Inc.
> > > http://hortonworks.com/
> > >
> > >
> > >
> >
> >
> > --
> > *With regards ---*
> > *Mohammad Mustaqeem*,
> > M.Tech (CSE)
> > MNNIT Allahabad
> > 9026604270
> >
>

Re: How to understand Hadoop source code ?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.
@Noelle , How can I join u?
I also want to learn..


On Sat, Apr 20, 2013 at 3:38 AM, Noelle Jakusz (c) <nj...@vmware.com>wrote:

> Thanks Steve, I think creating a list on the wiki is a good place to
> start, then we can divide and conquer. As newbies, when we figure it out it
> will be fresh in our minds and we can add to the documentation, I agree
> that this would be very useful for future participants in the project.
>
> I have created an account (noellejakusz) and would like write access to
> help with this...
>
> Thanks!
>
> -----Original Message-----
> From: Steve Loughran [mailto:stevel@hortonworks.com]
> Sent: Thursday, April 18, 2013 2:28 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: How to understand Hadoop source code ?
>
> On 18 April 2013 18:32, Noelle Jakusz (c) <nj...@vmware.com> wrote:
>
> > +1
> >
> > There are quite a few new people, so maybe start a collaborative group
> > where you can collect notes and steps (videos and articles). I know I
> > would have some for you that I have created as I have gotten
> > started... it would be a great idea to post them after some
> collaboration and review.
> >
> > Thanks Chris for the detailed reply...
> >
> >
> stuff in wiki.apache.org would be welcome, though there's the commitment
> to keep it up to date.
>
> Once you've created wiki accounts, email this list to get write access.
>
> One thing to consider is prerequisites. Hadoop is not a place to learn
> about basic distributed system concepts (liveness, failures, RPC), though
> some of the specifics (how liveness is implemented, how Hadoop RPC works)
> are going to be relevant. It's probably best to list "things you need to
> know". In particular, you should know Java and its build and test tools
> before going near Hadoop.
>
> -steve
>



-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to understand Hadoop source code ?

Posted by Steve Loughran <st...@hortonworks.com>.
On 19 April 2013 23:08, Noelle Jakusz (c) <nj...@vmware.com> wrote:

>
>
> I have created an account (noellejakusz) and would like write access to
> help with this...
>
>
OK, you have write access

RE: How to understand Hadoop source code ?

Posted by "Noelle Jakusz (c)" <nj...@vmware.com>.
Thanks Steve, I think creating a list on the wiki is a good place to start, then we can divide and conquer. As newbies, when we figure it out it will be fresh in our minds and we can add to the documentation, I agree that this would be very useful for future participants in the project.

I have created an account (noellejakusz) and would like write access to help with this...

Thanks!

-----Original Message-----
From: Steve Loughran [mailto:stevel@hortonworks.com] 
Sent: Thursday, April 18, 2013 2:28 PM
To: common-dev@hadoop.apache.org
Subject: Re: How to understand Hadoop source code ?

On 18 April 2013 18:32, Noelle Jakusz (c) <nj...@vmware.com> wrote:

> +1
>
> There are quite a few new people, so maybe start a collaborative group 
> where you can collect notes and steps (videos and articles). I know I 
> would have some for you that I have created as I have gotten 
> started... it would be a great idea to post them after some collaboration and review.
>
> Thanks Chris for the detailed reply...
>
>
stuff in wiki.apache.org would be welcome, though there's the commitment to keep it up to date.

Once you've created wiki accounts, email this list to get write access.

One thing to consider is prerequisites. Hadoop is not a place to learn about basic distributed system concepts (liveness, failures, RPC), though some of the specifics (how liveness is implemented, how Hadoop RPC works) are going to be relevant. It's probably best to list "things you need to know". In particular, you should know Java and its build and test tools before going near Hadoop.

-steve

Re: How to understand Hadoop source code ?

Posted by Steve Loughran <st...@hortonworks.com>.
On 18 April 2013 18:32, Noelle Jakusz (c) <nj...@vmware.com> wrote:

> +1
>
> There are quite a few new people, so maybe start a collaborative group
> where you can collect notes and steps (videos and articles). I know I would
> have some for you that I have created as I have gotten started... it would
> be a great idea to post them after some collaboration and review.
>
> Thanks Chris for the detailed reply...
>
>
stuff in wiki.apache.org would be welcome, though there's the commitment to
keep it up to date.

Once you've created wiki accounts, email this list to get write access.

One thing to consider is prerequisites. Hadoop is not a place to learn
about basic distributed system concepts (liveness, failures, RPC), though
some of the specifics (how liveness is implemented, how Hadoop RPC works)
are going to be relevant. It's probably best to list "things you need to
know". In particular, you should know Java and its build and test tools
before going near Hadoop.

-steve

RE: How to understand Hadoop source code ?

Posted by "Noelle Jakusz (c)" <nj...@vmware.com>.
+1

There are quite a few new people, so maybe start a collaborative group where you can collect notes and steps (videos and articles). I know I would have some for you that I have created as I have gotten started... it would be a great idea to post them after some collaboration and review.

Thanks Chris for the detailed reply...

-----Original Message-----
From: Chris Nauroth [mailto:cnauroth@hortonworks.com] 
Sent: Thursday, April 18, 2013 1:14 PM
To: common-dev@hadoop.apache.org
Subject: Re: How to understand Hadoop source code ?

Is there a specific bug fix or feature that you are trying to contribute?
 Specific questions like "how can I help with jira X?" or "what is the main entry point when I run the hdfs command?" or "where does the namenode serialize metadata to disk" or "where does the secondary namenode execute a checkpoint" can help focus the conversation.

AFAIK, we don't have a general code walkthrough document focused on onboarding new engineers.  This could be a valuable contribution if you want to gather notes while you learn.  I think this always works best if it's driven by a new engineer with review by an expert.  (If the experts write it, then they might accidentally skip something non-obvious that they've already internalized.)

Since that document doesn't exist yet, the other option is to do some reading of the code, ideally while trying to fix a specific bug that has been filed in jira.  Like you said, it's a relatively large codebase, so it's impractical to read the whole thing top-to-bottom.  Instead, it's important to look for high-level clues that steer you towards the right files.  I've found that the Maven module structure and the Java package names are usually descriptive enough to steer me in the right direction.
 If you focus on getting familiar with those, you'll basically build a btree inside your brain that helps you index into the right part of the codebase and answer your own questions rapidly.  Several examples:

"Where is the main entry point for the datanode daemon?": module hadoop-hdfs, package org.apache.hadoop.hdfs.server.datanode

"What is the algorithm for rebalancing an unbalanced cluster?": module hadoop-hdfs, package org.apache.hadoop.hdfs.server.balancer

"How does YARN launch a new container process?": module hadoop-yarn-server-nodemanager, package org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher

"Multiple daemons publish JMX metrics as a common concern.  Where is that
implemented?": module hadoop-common, package org.apache.hadoop.metrics2

I hope this is helpful to get the process started for you.  We're always here to help if you have specific follow-up questions.

Thanks,
--Chris


On Wed, Apr 17, 2013 at 10:33 PM, Prabakaran Krishnan < prabakaran_j2ee@yahoo.in> wrote:

> Couuld you please help me in understand map reduce in Hadoop?
>
>
>
> ________________________________
> From: Mohammad Mustaqeem <3m...@gmail.com>
> To: common-dev <co...@hadoop.apache.org>
> Sent: Thursday, 18 April 2013 10:44 AM
> Subject: Re: How to understand Hadoop source code ?
>
>
> I am interested in HDFS. Please guide me.
>
>
> On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Please don't cross post.
> >
> > What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
> >
> > Arun
> >
> > On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
> >
> > > Hello everyone,
> > >          I am new to this group. Since the source code of Hadoop 
> > > is
> very
> > > big, I am not able to understand it entirely.
> > > Is there any document that describes the code?
> > > Is there any way to understand the functionality of each classes 
> > > and
> its
> > > method?
> > >
> > >
> > > --
> > > *With regards ---*
> > > *Mohammad Mustaqeem*,
> > > M.Tech (CSE)
> > > MNNIT Allahabad
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
> >
>
>
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
> 9026604270
>

Re: How to understand Hadoop source code ?

Posted by Chris Nauroth <cn...@hortonworks.com>.
Is there a specific bug fix or feature that you are trying to contribute?
 Specific questions like "how can I help with jira X?" or "what is the main
entry point when I run the hdfs command?" or "where does the namenode
serialize metadata to disk" or "where does the secondary namenode execute a
checkpoint" can help focus the conversation.

AFAIK, we don't have a general code walkthrough document focused on
onboarding new engineers.  This could be a valuable contribution if you
want to gather notes while you learn.  I think this always works best if
it's driven by a new engineer with review by an expert.  (If the experts
write it, then they might accidentally skip something non-obvious that
they've already internalized.)

Since that document doesn't exist yet, the other option is to do some
reading of the code, ideally while trying to fix a specific bug that has
been filed in jira.  Like you said, it's a relatively large codebase, so
it's impractical to read the whole thing top-to-bottom.  Instead, it's
important to look for high-level clues that steer you towards the right
files.  I've found that the Maven module structure and the Java package
names are usually descriptive enough to steer me in the right direction.
 If you focus on getting familiar with those, you'll basically build a
btree inside your brain that helps you index into the right part of the
codebase and answer your own questions rapidly.  Several examples:

"Where is the main entry point for the datanode daemon?": module
hadoop-hdfs, package org.apache.hadoop.hdfs.server.datanode

"What is the algorithm for rebalancing an unbalanced cluster?": module
hadoop-hdfs, package org.apache.hadoop.hdfs.server.balancer

"How does YARN launch a new container process?": module
hadoop-yarn-server-nodemanager,
package org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher

"Multiple daemons publish JMX metrics as a common concern.  Where is that
implemented?": module hadoop-common, package org.apache.hadoop.metrics2

I hope this is helpful to get the process started for you.  We're always
here to help if you have specific follow-up questions.

Thanks,
--Chris


On Wed, Apr 17, 2013 at 10:33 PM, Prabakaran Krishnan <
prabakaran_j2ee@yahoo.in> wrote:

> Couuld you please help me in understand map reduce in Hadoop?
>
>
>
> ________________________________
> From: Mohammad Mustaqeem <3m...@gmail.com>
> To: common-dev <co...@hadoop.apache.org>
> Sent: Thursday, 18 April 2013 10:44 AM
> Subject: Re: How to understand Hadoop source code ?
>
>
> I am interested in HDFS. Please guide me.
>
>
> On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <ac...@hortonworks.com>
> wrote:
>
> > Please don't cross post.
> >
> > What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
> >
> > Arun
> >
> > On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
> >
> > > Hello everyone,
> > >          I am new to this group. Since the source code of Hadoop is
> very
> > > big, I am not able to understand it entirely.
> > > Is there any document that describes the code?
> > > Is there any way to understand the functionality of each classes and
> its
> > > method?
> > >
> > >
> > > --
> > > *With regards ---*
> > > *Mohammad Mustaqeem*,
> > > M.Tech (CSE)
> > > MNNIT Allahabad
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
> >
>
>
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
> 9026604270
>

Re: How to understand Hadoop source code ?

Posted by Prabakaran Krishnan <pr...@yahoo.in>.
Couuld you please help me in understand map reduce in Hadoop?



________________________________
From: Mohammad Mustaqeem <3m...@gmail.com>
To: common-dev <co...@hadoop.apache.org> 
Sent: Thursday, 18 April 2013 10:44 AM
Subject: Re: How to understand Hadoop source code ?


I am interested in HDFS. Please guide me.


On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Please don't cross post.
>
> What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
>
> Arun
>
> On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
>
> > Hello everyone,
> >          I am new to this group. Since the source code of Hadoop is very
> > big, I am not able to understand it entirely.
> > Is there any document that describes the code?
> > Is there any way to understand the functionality of each classes and its
> > method?
> >
> >
> > --
> > *With regards ---*
> > *Mohammad Mustaqeem*,
> > M.Tech (CSE)
> > MNNIT Allahabad
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to understand Hadoop source code ?

Posted by Mohammad Mustaqeem <3m...@gmail.com>.
I am interested in HDFS. Please guide me.


On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Please don't cross post.
>
> What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
>
> Arun
>
> On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
>
> > Hello everyone,
> >          I am new to this group. Since the source code of Hadoop is very
> > big, I am not able to understand it entirely.
> > Is there any document that describes the code?
> > Is there any way to understand the functionality of each classes and its
> > method?
> >
> >
> > --
> > *With regards ---*
> > *Mohammad Mustaqeem*,
> > M.Tech (CSE)
> > MNNIT Allahabad
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270

Re: How to understand Hadoop source code ?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Please don't cross post.

What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?

Arun

On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:

> Hello everyone,
>          I am new to this group. Since the source code of Hadoop is very
> big, I am not able to understand it entirely.
> Is there any document that describes the code?
> Is there any way to understand the functionality of each classes and its
> method?
> 
> 
> -- 
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/