You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Geoffrey Gallaway <ge...@geoffeg.org> on 2009/09/30 02:39:31 UTC

Developing Hadoop and HDFS

Hello,

Yes, another person looking to contribute to and develop Hadoop. I'm looking
to start off small, fixing a few bugs before moving into larger stuff.

First, a bit of background:
Years ago I had the idea of creating a semi-decentralized distributed file
system. The idea came when I was working for a small/medium sized company
who was looking for a simple backup solution for their workstations. PC's
back then came with 100+ GB hard drives but, as simple workstations,
employees were using less than half that space. Why not have each
workstation backup to a few other workstations, duplicating files across
multiple machines for redundancy. RAID for the network. I started coming up
with design and architecture specs, protocol examples and even started
writing a bit of the system (in Java). I tried to find a few interested
developers but everyone seemed to think the task was much too large to be
accomplished as a side project (and I didn't think, given the IT industry of
the time, that anyone would fund it). Later, I realized such a distributed
system could be much more than a simple file backup solution.

It looks like Hadoop and HDFS are creating a lot of what I had wanted to
create, it's already surpassed what I had in mind in most ways.

So, where should I start? Just start fixing bugs listed in JIRA?

Geoff

Re: Developing Hadoop and HDFS

Posted by Jakob Homan <jh...@yahoo-inc.com>.

Thanks for your interest, Geoff.  Yes, finding open JIRAS and 
contributing patches is very helpful.  We also maintain a wishlist of 
projects that one could work on: 
http://wiki.apache.org/hadoop/ProjectSuggestions.  In addition, please 
do consider documentation and example work as well, as this is very 
helpful both to new users and developers starting on the project.

Thanks,
Jakob
Hadoop at Yahoo!

Geoffrey Gallaway wrote:
> Hello,
> 
> Yes, another person looking to contribute to and develop Hadoop. I'm looking
> to start off small, fixing a few bugs before moving into larger stuff.
> 
> First, a bit of background:
> Years ago I had the idea of creating a semi-decentralized distributed file
> system. The idea came when I was working for a small/medium sized company
> who was looking for a simple backup solution for their workstations. PC's
> back then came with 100+ GB hard drives but, as simple workstations,
> employees were using less than half that space. Why not have each
> workstation backup to a few other workstations, duplicating files across
> multiple machines for redundancy. RAID for the network. I started coming up
> with design and architecture specs, protocol examples and even started
> writing a bit of the system (in Java). I tried to find a few interested
> developers but everyone seemed to think the task was much too large to be
> accomplished as a side project (and I didn't think, given the IT industry of
> the time, that anyone would fund it). Later, I realized such a distributed
> system could be much more than a simple file backup solution.
> 
> It looks like Hadoop and HDFS are creating a lot of what I had wanted to
> create, it's already surpassed what I had in mind in most ways.
> 
> So, where should I start? Just start fixing bugs listed in JIRA?
> 
> Geoff
>