You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Gautham Banasandra <ga...@apache.org> on 2022/06/12 17:12:03 UTC

Re: Resources for understanding Hadoop

Hi Rahul,

I was looking for something more detailed and low-level like how the code
> for the various services in HDFS is organized, entrypoints etc.

I found this book useful to get a good idea of Hadoop in general - Apache
Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
Hadoop™ 2 [Book] (oreilly.com)
<https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/>.

In my opinion, you get into Open Source contributions by just doing so. You
don't have to know HDFS in detail to start contributing to it. Now that
you've gone through the Hadoop documentation, try setting up Hadoop in
pseudo-distributed mode. If you notice any glitch, try fixing it and send
out a PR. You never know what issue you'll find. I ran into this when I
tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
1.72 - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
multiple nodes. Once you're able to do this, try browsing issues.apache.org
and you'll find tons of issues that you can work on. There's always so much
work to do in Open Source and the thing that I like the most is that
"there's no deadline on anything" :) So, you can really work on some
awesome stuff, own it, perfect it and share it with the world.

Best of luck.

Thanks,
--Gautham

On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com> wrote:

> Hi all,
> I am a newbie wanting to start contributing to the hadoop ecosystem. I want
> to start by contributing to HDFS and was looking for resources to
> understand the architecture and I just found this -
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> which is a fairly high level documentation. I was looking for something
> more detailed and low-level like how the code for the various services in
> HDFS is organized, entrypoints etc. Can someone point me to such resources?
> Also is there a slack workspace for such discussions? Not sure if this
> mailing list is the right forum for such doubts.
>

Re: Resources for understanding Hadoop

Posted by Brahma Reddy Battula <br...@apache.org>.
Please go through the following link.
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

*Use the following command to start the cluster.*

$HADOOP_HOME/bin/hdfs --daemon start namenode

$HADOOP_HOME/bin/hdfs --daemon start datanode


On Thu, Jun 23, 2022 at 10:26 PM Rahul Bhardwaj <ra...@gmail.com>
wrote:

> Yeah, I followed this and can see that the .class files have been
> generated, but not sure how to run it. In the wiki I shared, "start-dfs.sh"
> has been used. So i tried using the same script here from
> hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-dfs.sh, but this errors
> out. In BUILDING.txt I didnt find instructions on how to run the hadoop
> daemons.
>
> On Thu, 23 Jun 2022 at 21:53, Brahma Reddy Battula <br...@apache.org>
> wrote:
>
>>
>> Please go through the following
>>
>> https://github.com/apache/hadoop/blob/trunk/BUILDING.txt
>>
>> and a specific command to generate the distribution which can be run
>> after your changes.
>> mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
>>
>> Hope this helps.
>>
>>
>>
>>
>> On Thu, Jun 23, 2022 at 9:41 PM Rahul Bhardwaj <ra...@gmail.com>
>> wrote:
>>
>>> I am following this wiki
>>> <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html> to
>>> build and run hadoop locally in pseudo-dsitributed mode. But I am unable to
>>> figure out how to build my changes and generate similar binaries so that I
>>> can test my changes locally. Is there some documentation on how to do this?
>>>
>>> On Mon, 13 Jun 2022 at 00:26, Brahma Reddy Battula <br...@apache.org>
>>> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>> Welcome to hadoop world.
>>>>
>>>> Apart from the gautham mentioned, you can check the following also.
>>>> https://livebook.manning.com/book/hadoop-in-action/part-1/
>>>>
>>>> Go through the following wiki for contributions
>>>> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
>>>>
>>>>
>>>> Please subscribe to the hadoop mailing list[1], and shoot your queries
>>>> there from next time.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 1. https://hadoop.apache.org/mailing_lists.html
>>>>
>>>> On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Rahul,
>>>>>
>>>>> I was looking for something more detailed and low-level like how the
>>>>> code
>>>>> > for the various services in HDFS is organized, entrypoints etc.
>>>>>
>>>>> I found this book useful to get a good idea of Hadoop in general -
>>>>> Apache
>>>>> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
>>>>> Hadoop™ 2 [Book] (oreilly.com)
>>>>> <
>>>>> https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
>>>>> >.
>>>>>
>>>>> In my opinion, you get into Open Source contributions by just doing
>>>>> so. You
>>>>> don't have to know HDFS in detail to start contributing to it. Now that
>>>>> you've gone through the Hadoop documentation, try setting up Hadoop in
>>>>> pseudo-distributed mode. If you notice any glitch, try fixing it and
>>>>> send
>>>>> out a PR. You never know what issue you'll find. I ran into this when I
>>>>> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library
>>>>> to
>>>>> 1.72 - ASF JIRA (apache.org)
>>>>> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was
>>>>> my
>>>>> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
>>>>> multiple nodes. Once you're able to do this, try browsing
>>>>> issues.apache.org
>>>>> and you'll find tons of issues that you can work on. There's always so
>>>>> much
>>>>> work to do in Open Source and the thing that I like the most is that
>>>>> "there's no deadline on anything" :) So, you can really work on some
>>>>> awesome stuff, own it, perfect it and share it with the world.
>>>>>
>>>>> Best of luck.
>>>>>
>>>>> Thanks,
>>>>> --Gautham
>>>>>
>>>>> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi all,
>>>>> > I am a newbie wanting to start contributing to the hadoop ecosystem.
>>>>> I want
>>>>> > to start by contributing to HDFS and was looking for resources to
>>>>> > understand the architecture and I just found this -
>>>>> >
>>>>> >
>>>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>>>>> > which is a fairly high level documentation. I was looking for
>>>>> something
>>>>> > more detailed and low-level like how the code for the various
>>>>> services in
>>>>> > HDFS is organized, entrypoints etc. Can someone point me to such
>>>>> resources?
>>>>> > Also is there a slack workspace for such discussions? Not sure if
>>>>> this
>>>>> > mailing list is the right forum for such doubts.
>>>>> >
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>> --Brahma Reddy Battula
>>>>
>>>
>>
>> --
>>
>>
>>
>> --Brahma Reddy Battula
>>
>

-- 



--Brahma Reddy Battula

Re: Resources for understanding Hadoop

Posted by Brahma Reddy Battula <br...@apache.org>.
Please go through the following

https://github.com/apache/hadoop/blob/trunk/BUILDING.txt

and a specific command to generate the distribution which can be run after
your changes.
mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Hope this helps.




On Thu, Jun 23, 2022 at 9:41 PM Rahul Bhardwaj <ra...@gmail.com>
wrote:

> I am following this wiki
> <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html> to
> build and run hadoop locally in pseudo-dsitributed mode. But I am unable to
> figure out how to build my changes and generate similar binaries so that I
> can test my changes locally. Is there some documentation on how to do this?
>
> On Mon, 13 Jun 2022 at 00:26, Brahma Reddy Battula <br...@apache.org>
> wrote:
>
>> Hi Rahul,
>>
>> Welcome to hadoop world.
>>
>> Apart from the gautham mentioned, you can check the following also.
>> https://livebook.manning.com/book/hadoop-in-action/part-1/
>>
>> Go through the following wiki for contributions
>> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
>>
>>
>> Please subscribe to the hadoop mailing list[1], and shoot your queries
>> there from next time.
>>
>>
>>
>>
>>
>> 1. https://hadoop.apache.org/mailing_lists.html
>>
>> On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
>> wrote:
>>
>>> Hi Rahul,
>>>
>>> I was looking for something more detailed and low-level like how the code
>>> > for the various services in HDFS is organized, entrypoints etc.
>>>
>>> I found this book useful to get a good idea of Hadoop in general - Apache
>>> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
>>> Hadoop™ 2 [Book] (oreilly.com)
>>> <
>>> https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
>>> >.
>>>
>>> In my opinion, you get into Open Source contributions by just doing so.
>>> You
>>> don't have to know HDFS in detail to start contributing to it. Now that
>>> you've gone through the Hadoop documentation, try setting up Hadoop in
>>> pseudo-distributed mode. If you notice any glitch, try fixing it and send
>>> out a PR. You never know what issue you'll find. I ran into this when I
>>> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
>>> 1.72 - ASF JIRA (apache.org)
>>> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
>>> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
>>> multiple nodes. Once you're able to do this, try browsing
>>> issues.apache.org
>>> and you'll find tons of issues that you can work on. There's always so
>>> much
>>> work to do in Open Source and the thing that I like the most is that
>>> "there's no deadline on anything" :) So, you can really work on some
>>> awesome stuff, own it, perfect it and share it with the world.
>>>
>>> Best of luck.
>>>
>>> Thanks,
>>> --Gautham
>>>
>>> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hi all,
>>> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
>>> want
>>> > to start by contributing to HDFS and was looking for resources to
>>> > understand the architecture and I just found this -
>>> >
>>> >
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>>> > which is a fairly high level documentation. I was looking for something
>>> > more detailed and low-level like how the code for the various services
>>> in
>>> > HDFS is organized, entrypoints etc. Can someone point me to such
>>> resources?
>>> > Also is there a slack workspace for such discussions? Not sure if this
>>> > mailing list is the right forum for such doubts.
>>> >
>>>
>>
>>
>> --
>>
>>
>>
>> --Brahma Reddy Battula
>>
>

-- 



--Brahma Reddy Battula

Re: Resources for understanding Hadoop

Posted by Brahma Reddy Battula <br...@apache.org>.
Please go through the following

https://github.com/apache/hadoop/blob/trunk/BUILDING.txt

and a specific command to generate the distribution which can be run after
your changes.
mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Hope this helps.




On Thu, Jun 23, 2022 at 9:41 PM Rahul Bhardwaj <ra...@gmail.com>
wrote:

> I am following this wiki
> <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html> to
> build and run hadoop locally in pseudo-dsitributed mode. But I am unable to
> figure out how to build my changes and generate similar binaries so that I
> can test my changes locally. Is there some documentation on how to do this?
>
> On Mon, 13 Jun 2022 at 00:26, Brahma Reddy Battula <br...@apache.org>
> wrote:
>
>> Hi Rahul,
>>
>> Welcome to hadoop world.
>>
>> Apart from the gautham mentioned, you can check the following also.
>> https://livebook.manning.com/book/hadoop-in-action/part-1/
>>
>> Go through the following wiki for contributions
>> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
>>
>>
>> Please subscribe to the hadoop mailing list[1], and shoot your queries
>> there from next time.
>>
>>
>>
>>
>>
>> 1. https://hadoop.apache.org/mailing_lists.html
>>
>> On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
>> wrote:
>>
>>> Hi Rahul,
>>>
>>> I was looking for something more detailed and low-level like how the code
>>> > for the various services in HDFS is organized, entrypoints etc.
>>>
>>> I found this book useful to get a good idea of Hadoop in general - Apache
>>> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
>>> Hadoop™ 2 [Book] (oreilly.com)
>>> <
>>> https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
>>> >.
>>>
>>> In my opinion, you get into Open Source contributions by just doing so.
>>> You
>>> don't have to know HDFS in detail to start contributing to it. Now that
>>> you've gone through the Hadoop documentation, try setting up Hadoop in
>>> pseudo-distributed mode. If you notice any glitch, try fixing it and send
>>> out a PR. You never know what issue you'll find. I ran into this when I
>>> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
>>> 1.72 - ASF JIRA (apache.org)
>>> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
>>> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
>>> multiple nodes. Once you're able to do this, try browsing
>>> issues.apache.org
>>> and you'll find tons of issues that you can work on. There's always so
>>> much
>>> work to do in Open Source and the thing that I like the most is that
>>> "there's no deadline on anything" :) So, you can really work on some
>>> awesome stuff, own it, perfect it and share it with the world.
>>>
>>> Best of luck.
>>>
>>> Thanks,
>>> --Gautham
>>>
>>> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hi all,
>>> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
>>> want
>>> > to start by contributing to HDFS and was looking for resources to
>>> > understand the architecture and I just found this -
>>> >
>>> >
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>>> > which is a fairly high level documentation. I was looking for something
>>> > more detailed and low-level like how the code for the various services
>>> in
>>> > HDFS is organized, entrypoints etc. Can someone point me to such
>>> resources?
>>> > Also is there a slack workspace for such discussions? Not sure if this
>>> > mailing list is the right forum for such doubts.
>>> >
>>>
>>
>>
>> --
>>
>>
>>
>> --Brahma Reddy Battula
>>
>

-- 



--Brahma Reddy Battula

Re: Resources for understanding Hadoop

Posted by Rahul Bhardwaj <ra...@gmail.com>.
I am following this wiki
<https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html>
to
build and run hadoop locally in pseudo-dsitributed mode. But I am unable to
figure out how to build my changes and generate similar binaries so that I
can test my changes locally. Is there some documentation on how to do this?

On Mon, 13 Jun 2022 at 00:26, Brahma Reddy Battula <br...@apache.org>
wrote:

> Hi Rahul,
>
> Welcome to hadoop world.
>
> Apart from the gautham mentioned, you can check the following also.
> https://livebook.manning.com/book/hadoop-in-action/part-1/
>
> Go through the following wiki for contributions
> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
>
>
> Please subscribe to the hadoop mailing list[1], and shoot your queries
> there from next time.
>
>
>
>
>
> 1. https://hadoop.apache.org/mailing_lists.html
>
> On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
> wrote:
>
>> Hi Rahul,
>>
>> I was looking for something more detailed and low-level like how the code
>> > for the various services in HDFS is organized, entrypoints etc.
>>
>> I found this book useful to get a good idea of Hadoop in general - Apache
>> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
>> Hadoop™ 2 [Book] (oreilly.com)
>> <https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
>> >.
>>
>> In my opinion, you get into Open Source contributions by just doing so.
>> You
>> don't have to know HDFS in detail to start contributing to it. Now that
>> you've gone through the Hadoop documentation, try setting up Hadoop in
>> pseudo-distributed mode. If you notice any glitch, try fixing it and send
>> out a PR. You never know what issue you'll find. I ran into this when I
>> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
>> 1.72 - ASF JIRA (apache.org)
>> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
>> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
>> multiple nodes. Once you're able to do this, try browsing
>> issues.apache.org
>> and you'll find tons of issues that you can work on. There's always so
>> much
>> work to do in Open Source and the thing that I like the most is that
>> "there's no deadline on anything" :) So, you can really work on some
>> awesome stuff, own it, perfect it and share it with the world.
>>
>> Best of luck.
>>
>> Thanks,
>> --Gautham
>>
>> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
>> want
>> > to start by contributing to HDFS and was looking for resources to
>> > understand the architecture and I just found this -
>> >
>> >
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>> > which is a fairly high level documentation. I was looking for something
>> > more detailed and low-level like how the code for the various services
>> in
>> > HDFS is organized, entrypoints etc. Can someone point me to such
>> resources?
>> > Also is there a slack workspace for such discussions? Not sure if this
>> > mailing list is the right forum for such doubts.
>> >
>>
>
>
> --
>
>
>
> --Brahma Reddy Battula
>

Re: Resources for understanding Hadoop

Posted by Rahul Bhardwaj <ra...@gmail.com>.
I am following this wiki
<https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html>
to
build and run hadoop locally in pseudo-dsitributed mode. But I am unable to
figure out how to build my changes and generate similar binaries so that I
can test my changes locally. Is there some documentation on how to do this?

On Mon, 13 Jun 2022 at 00:26, Brahma Reddy Battula <br...@apache.org>
wrote:

> Hi Rahul,
>
> Welcome to hadoop world.
>
> Apart from the gautham mentioned, you can check the following also.
> https://livebook.manning.com/book/hadoop-in-action/part-1/
>
> Go through the following wiki for contributions
> https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
>
>
> Please subscribe to the hadoop mailing list[1], and shoot your queries
> there from next time.
>
>
>
>
>
> 1. https://hadoop.apache.org/mailing_lists.html
>
> On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
> wrote:
>
>> Hi Rahul,
>>
>> I was looking for something more detailed and low-level like how the code
>> > for the various services in HDFS is organized, entrypoints etc.
>>
>> I found this book useful to get a good idea of Hadoop in general - Apache
>> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
>> Hadoop™ 2 [Book] (oreilly.com)
>> <https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
>> >.
>>
>> In my opinion, you get into Open Source contributions by just doing so.
>> You
>> don't have to know HDFS in detail to start contributing to it. Now that
>> you've gone through the Hadoop documentation, try setting up Hadoop in
>> pseudo-distributed mode. If you notice any glitch, try fixing it and send
>> out a PR. You never know what issue you'll find. I ran into this when I
>> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
>> 1.72 - ASF JIRA (apache.org)
>> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
>> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
>> multiple nodes. Once you're able to do this, try browsing
>> issues.apache.org
>> and you'll find tons of issues that you can work on. There's always so
>> much
>> work to do in Open Source and the thing that I like the most is that
>> "there's no deadline on anything" :) So, you can really work on some
>> awesome stuff, own it, perfect it and share it with the world.
>>
>> Best of luck.
>>
>> Thanks,
>> --Gautham
>>
>> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
>> want
>> > to start by contributing to HDFS and was looking for resources to
>> > understand the architecture and I just found this -
>> >
>> >
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
>> > which is a fairly high level documentation. I was looking for something
>> > more detailed and low-level like how the code for the various services
>> in
>> > HDFS is organized, entrypoints etc. Can someone point me to such
>> resources?
>> > Also is there a slack workspace for such discussions? Not sure if this
>> > mailing list is the right forum for such doubts.
>> >
>>
>
>
> --
>
>
>
> --Brahma Reddy Battula
>

Re: Resources for understanding Hadoop

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Rahul,

Welcome to hadoop world.

Apart from the gautham mentioned, you can check the following also.
https://livebook.manning.com/book/hadoop-in-action/part-1/

Go through the following wiki for contributions
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute


Please subscribe to the hadoop mailing list[1], and shoot your queries
there from next time.





1. https://hadoop.apache.org/mailing_lists.html

On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
wrote:

> Hi Rahul,
>
> I was looking for something more detailed and low-level like how the code
> > for the various services in HDFS is organized, entrypoints etc.
>
> I found this book useful to get a good idea of Hadoop in general - Apache
> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
> Hadoop™ 2 [Book] (oreilly.com)
> <https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
> >.
>
> In my opinion, you get into Open Source contributions by just doing so. You
> don't have to know HDFS in detail to start contributing to it. Now that
> you've gone through the Hadoop documentation, try setting up Hadoop in
> pseudo-distributed mode. If you notice any glitch, try fixing it and send
> out a PR. You never know what issue you'll find. I ran into this when I
> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
> 1.72 - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
> multiple nodes. Once you're able to do this, try browsing
> issues.apache.org
> and you'll find tons of issues that you can work on. There's always so much
> work to do in Open Source and the thing that I like the most is that
> "there's no deadline on anything" :) So, you can really work on some
> awesome stuff, own it, perfect it and share it with the world.
>
> Best of luck.
>
> Thanks,
> --Gautham
>
> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
> wrote:
>
> > Hi all,
> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
> want
> > to start by contributing to HDFS and was looking for resources to
> > understand the architecture and I just found this -
> >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> > which is a fairly high level documentation. I was looking for something
> > more detailed and low-level like how the code for the various services in
> > HDFS is organized, entrypoints etc. Can someone point me to such
> resources?
> > Also is there a slack workspace for such discussions? Not sure if this
> > mailing list is the right forum for such doubts.
> >
>


-- 



--Brahma Reddy Battula

Re: Resources for understanding Hadoop

Posted by Brahma Reddy Battula <br...@apache.org>.
Hi Rahul,

Welcome to hadoop world.

Apart from the gautham mentioned, you can check the following also.
https://livebook.manning.com/book/hadoop-in-action/part-1/

Go through the following wiki for contributions
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute


Please subscribe to the hadoop mailing list[1], and shoot your queries
there from next time.





1. https://hadoop.apache.org/mailing_lists.html

On Sun, Jun 12, 2022 at 10:42 PM Gautham Banasandra <ga...@apache.org>
wrote:

> Hi Rahul,
>
> I was looking for something more detailed and low-level like how the code
> > for the various services in HDFS is organized, entrypoints etc.
>
> I found this book useful to get a good idea of Hadoop in general - Apache
> Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache
> Hadoop™ 2 [Book] (oreilly.com)
> <https://www.oreilly.com/library/view/apache-hadooptm-yarn/9780133441925/
> >.
>
> In my opinion, you get into Open Source contributions by just doing so. You
> don't have to know HDFS in detail to start contributing to it. Now that
> you've gone through the Hadoop documentation, try setting up Hadoop in
> pseudo-distributed mode. If you notice any glitch, try fixing it and send
> out a PR. You never know what issue you'll find. I ran into this when I
> tried compiling Hadoop on Windows - [HDFS-15385] Upgrade boost library to
> 1.72 - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/HDFS-15385> (And yes, this was my
> first PR to Hadoop). Then use Docker and set up the Hadoop cluster with
> multiple nodes. Once you're able to do this, try browsing
> issues.apache.org
> and you'll find tons of issues that you can work on. There's always so much
> work to do in Open Source and the thing that I like the most is that
> "there's no deadline on anything" :) So, you can really work on some
> awesome stuff, own it, perfect it and share it with the world.
>
> Best of luck.
>
> Thanks,
> --Gautham
>
> On Sun, 12 Jun 2022 at 16:34, Rahul Bhardwaj <ra...@gmail.com>
> wrote:
>
> > Hi all,
> > I am a newbie wanting to start contributing to the hadoop ecosystem. I
> want
> > to start by contributing to HDFS and was looking for resources to
> > understand the architecture and I just found this -
> >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> > which is a fairly high level documentation. I was looking for something
> > more detailed and low-level like how the code for the various services in
> > HDFS is organized, entrypoints etc. Can someone point me to such
> resources?
> > Also is there a slack workspace for such discussions? Not sure if this
> > mailing list is the right forum for such doubts.
> >
>


-- 



--Brahma Reddy Battula