You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2007/01/21 09:49:35 UTC

[Nutch Wiki] Update of "Becoming A Nutch Developer" by DennisKubes

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by DennisKubes:
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

The comment on the change is:
Beginning of "Becoming a Nutch Developer" Document

New page:
== Overview ==
Search is complex.  Nutch makes it easier. So you start off by installing Nutch.  Now you are a pretty good developer.  You get Nutch up and running without problems and you think it is a pretty neat piece of software.  In fact you like it so much that you want to start adding to it.  You want to develop new features and contribute them back to the community.  But then it dawns on you.  How does one contribute to an open source project like Nutch.  You have never worked on an open source project before and don't really know how the entire process works.

That is where this document comes in.  The purpose of this document is to help you as developers take the next step in becoming a contributing members of the Nutch community.  We will cover a general overview of the Nutch development process.  What are the different pieces and how do they fit together.  How does the community work and interact.  We will cover about using the mailing lists to search for information and how to ask questions to ensure that they get answered.  We will cover how to go about learning the internals of the Nutch codebase.  We will cover how to start developing for Nutch including coding standards, using subversion, setting up nutch in development environments, building nutch from source, debugging, and unit tests.  And finally we will cover contributing back to the Nutch community through documentation, code fixes, new features, and providing guidance to other developers.  When we are finished you should have a good understanding of how the community
  works and how you can go about becoming a bigger part of that community.

== The Nutch Community ==

=== Nutch Development Roles ===
There are three main roles that people can play in the Nutch community. 

The first role is that of user.  This is someone who uses the Nutch software but is not active in its development.  People in this category range from the curious programmer who wants to learn more about search technology to corporations setting up search on their local intranet.  If you only want to use the Nutch software and don't want to help develop it, you can still be a contributing member of the community.  By using the software and pushing the limits of what it can do, by filing bug reports and feature requests (more about how to do this later), working with developers to track down issues, or just giving your input to discussions that arise, you can help the Nutch project become better and better.
 
The second role is that of developer.  This is someone who has used the Nutch software and has taken the next step to help develop or program the software.  Helping to develop the Nutch codebase can come in the form of code fixes called patches, or by developing completely new features from scratch.  An important thing to remember is that unlike most software development at big companies, you don't need anybody's permission to start developing software for Nutch.  If you think you have a good idea for a feature, or if you want to track down and fix bugs in the software, go do it.  If you want a specific piece of functionality, don't wait for someone else to develop it.  Take the time to learn and do-it yourself.  Then when you are done give it back to the community. This is how the Nutch project has been developed so far and how it will continue to be developed in the future.  The community is a do-ocrcy, meaning those who do the work get to help set the directions and make t
 he decisions.  Communication is essential but not limiting.  Anybody can become a developer simply by submitting source code, whether fixes of functionality, for inclusion in the project.

The third role is that of committer.  This is usually a developer who has been working with the project for some time.  Someone who has developed new pieces of functionality, who has fixed bugs, and who has helped others through answering quesions and providing guidance to others through the mailing lists.  In other words this is a person who has proved their commitment and usefullness to the project and in return are given commit access to the source code repository, an apache email address, and the ability to help make short term decisions fo the project by determining what submissions and bug fixes make it into the source code repository and release versions of the software.

=== How the Community Works ===
The community works together through shared mailing lists, email, wikis, bug tracking systems, and source code repositories.  These tools when used together provide a virtual meeting room and workspace for all members of the community.

==== Mailing Lists and Email ====
Most communication is done through email and the mailing lists.  Because of this the first thing that any person should do to become part of the Nutch community is to join the appropriate mailing lists.  There are four different mailing lists.  First there is the users mailing list.  Contrary to the name this list is not just for users.  If you have questions about the Nutch software including installation, configuration, bugs, errors, or general information, this is the list for you.  Second is the dev mailing list.  This is where most development communication occurs including updates to request tracking systems.  This is also where developers can pose ideas for new functionality to see if someone is already working on such features or just to get general feedback.  The dev mailing list is important for tracking functionality that other developers may be working on and consensus by the community on desired direction of new features.  The third list is the commit mailing lis
 ts.  This list tracks commits to the source code repository and changes in the wiki pages.  The fourth list is the agents list.  This is where webmasters and other people can post comments or questions about the Nutch crawler.

Users can get by with subscribing to only the users mailing list.  Developers should subscribe to all four mailing lists.  Anybody doing internet crawls needs to be subscribed to the agents list.  In order to post to any list, for example to ask a questions, it is necessary to first be subscribed to that list.  Below are links for subscribing to the different mailing lists.

 * [http://lucene.apache.org/nutch/mailing_lists.html Nutch Mailing Lists]
 * [mailto:nutch-user-subscribe@lucene.apache.org Subscribe to Users]
 * [mailto:nutch-dev-subscribe@lucene.apache.org Subscribe to Dev]
 * [mailto:nutch-commits-subscribe@lucene.apache.org Subscribe to Commits]
 * [mailto:nutch-agent-subscribe@lucene.apache.org Subscribe to Agents]

==== JIRA and Issue/Request Tracking ====
If mailing lists provide the ongoing conversation for the community, the issue/request tracking system provides a repository for the current state of the project.  The request tracking system is JIRA system and it can be accessed at this address.

 * [https://issues.apache.org/jira/browse/NUTCH Nutch JIRA]
 * [https://issues.apache.org/jira/secure/Signup!default.jspa Signup for JIRA]

The JIRA system is the central repository for all work wanting to be included int the Nutch source code base.  The system tracks issues and feature requests by component, by version, and by status.  You can view what requests are assigned to what person, what requests are currently being worked on, and which ones haven't been scheduled. You can search all requests by keyword or by various categories and filters.  We will go into detail later on how to use the JIRA system to propose new functionality and submit bug fixes.  For now understand this.  If you are going to be a developer you will need to userstand how to use the JIRA system as this is where you will propose new functionality, submit bug fixes, give you input on features other developers may be working on, and coordinate actions with other developers on specific pieces of functionality.

The address to signup for JIRA was given above.  Once you have signed up you will have access to all of the Apache JIRA repository, not just the Nutch project.

==== Source Code Control through Subversion ====
Source code control is very important to open source projects.  Nutch uses the apache subversion repository for it source control.  As a developer you will want to get into the habit of downloading and updated your development environment directly from the subverion repository.  We will go into detail about how to do this later.  There are two types of logins to the repository, users and committers.  Users can download the repository but cannot make changes directly to the repository.  You can make changes on your local system and those changes can be submitted to the JIRA system.  Committers hold the committer role that we discussed previously.  These individuals can make changes directly to the subversion repository and are responsible for take patches from the JIRA system and applying them to subversion where they then become available to all users.

==== Wiki and Documentation ====
The weakest part of most open source projects is their documentation and Nutch is no exception.  Wikis are special web pages like the one that you are reading that allows users to directly edit text on the page and to create new pages.  The wiki provides various tutorials and documentation for Nutch.  Links to view the Nutch wiki and to register for the wiki are provided below.

 * [http://wiki.apache.org/nutch/ Nutch Wiki]
 * [http://wiki.apache.org/nutch/UserPreferences Signup for the Wiki]

As a developer one of the ways you can contribute back to the community is be documenting your hard won experience on the wiki.  You can do this in the form of tutorials, articles, or simple notes.  The wiki is also used as a virtual white board to help document general themes and directions for the project.

These four tools, mailing lists and email, JIRA, Wikis, and Subverion together provide the community ways to coordinate their actions and conversations.  As a developer you will need to understand each of these tools and how you will use them in the development process.  Later parts of this document will explain each of these tools in more detail to give you the base of knowledge you will need to start being a productive member of the Nutch development community.

... more to come later