You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Chris Douglas <cd...@apache.org> on 2010/05/29 02:37:06 UTC

Contributor Meeting Minutes 05/28/2010

This month, the MapReduce + HDFS contributor meeting was held at
Cloudera Headquarters.

Announcements for contributor meetings are here:
http://www.meetup.com/Hadoop-Contributors/

Minutes follow. No decisions were made at this meeting, but the
following issues were discussed and may presage future discussion and
decisions on these lists.

Eli, I think you have all the slides. Would you mind sending them out? -C

== 0.21 release update ==
* Continuing to close blockers, ping people for updates and suggestions
* About 20 open blockers. Many are MapReduce documentation that may be
pushed. Speak up if 0.21 is missing anything substantive.
* Common/HDFS visibility and annotations are close to consensus;
MapReduce annotations are committed to trunk and the 0.21 branch

== HEP proposal ==
(what follows is the sketch presented at the meeting. A full proposal
with concrete details will be circulated on the list)

* Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
* Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
  - Addresses the perception that there is friction between
innovation/experimentation and stability
* Not for small enhancements, features, and bug fixes. This should not
slow down typical development or impede casual contribution to Hadoop
* Primary mechanism for new features, collecting input, documenting
design decisions
* JIRA is good for details, but not for deciding on wide shifts in direction
* Purpose is for author to build consensus and gather dissenting opinions.
  - All may comment, but Editors will review incoming HEP material
  - Editors determine only whether the HEP is complete, not whether
they believe it is a sound idea
  - Editors are appointed by the PMC
  - Mechanism for appointing Editors and term of service TBD
    - Apache Board appoints Shepherds for projects somewhat randomly,
to projects. A similar mechanism could work for incoming HEPs
  - Proposal *may* come with code, but not necessarily.
Drafting/baking of the HEP occurs in public on a list dedicated to
that particular proposal. Once Editors certify the HEP as complete, it
is sent to general@ for wider discussion.
    - The discussion phase begins on general@. The mailing list exists
to ensure the HEP is complete enough to present to the community.
  - Some discussion on the difference between posting to general@ and
posting to the HEP list. Completeness is, of course, subjective. If
the Editor and Author disagree whether the proposal affects an aspect
of the framework enough to merit special consideration, it is not
entirely clear how to resolve the disagreement.
    - In general, the role of the Editor in the community-driven
process of Hadoop is not entirely clear. It may be possible to
optimize it out.
  - Once discussion ends, the HEP is passed (or fails to pass) by a
vote of the PMC (mechanics undefined). In Python, the result is
committed to the repository. A similar practice would make sense in
Hadoop.
* Which issues require HEPs?
  - Discussion ranged. Append, backup namenode, edit log rewrite, et
al. were examples of features substantial enough to merit a HEP. Pure
Java CRC is an example of an enhancement that would not. Whether an
explicit process must be in place to determine whether an issue
requires a HEP is not clear.
  - Viewing HEPs as a way of soliciting consensus for an approach
might be more accurate. Going through the HEP process should always
improve the chances of a successful proposal

* Evaluation
  - The proposal may be rejected if it is redundant with existing
functionality, technically unsound, insufficiently motivated, no
backwards compatibility story, etc.
  - Implementation is not necessary, and is lightly discouraged.
Feedback is less welcome once code is in hand.
  - Purpose is to be clear about the acceptance criteria for that
issue, e.g. concerns that the proposal may not scale or may harm
performance
  - Dissenting opinions must be recorded accurately. Quoting would be
a safe practice for the Author to encourage HEP reviewers not to block
the product of the proposal.

* The testing burden and completion strategy may be ambiguous
  - Whether the proposal affects scalability may not be testable by
the implementer. Completing the proposal to address all use cases may
require considerably more work than the Author is willing or motivated
to invest.
  - The HEP discussion on general@ should explore whether such
objections are merited and reasonable. For example, a particularly
obscure/esoteric use case could be included as a condition for
acceptance if the dissenter is willing to invest the resources to
test/validate it. The process is flexible in this regard.
    - But it is not infinitely flexible. Backwards compatibility,
performance regression, availability, and other considerations need
not be called out in every HEP.
    - Traditional concerns need to be documented. Acceptance criteria
should ideally be automated and reproducible in different
organizations

== Branching ==
* A patch and a branch are isomorphic from a policy perspective. Of
course, they are functionally distinct: branches are easier to
collaborate on and are, generally, longer-lived than are patches. But
special policies need not be derived to account for these differences,
which concern the production of the code, not its review and
acceptance.
* Some developers find branches to be easier to review than very large
patches and easier to merge, given a toolchain that supports this.
  - Subversion currently is difficult to adapt to this model
  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
* Eclipse Labs
  - Branded version of Google Code (same functionality, w/ Eclipse brand)
  - Not official Eclipse projects, but associated with Eclipse
  - Apache/Hadoop may consider a similar strategy
  - Distinct from Apache Labs, as one need not be a committer, follow
its rules for releases, etc.

== Contrib ==
* Modules (such as fuse-dfs) are not actively maintained in the main
repository and would benefit from a release schedule decoupled from
the rest of Hadoop
* With few exceptions, the contrib modules have smaller, often
discrete groups of maintainers. It may be worth exploring whether
these projects could live elsewhere

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

I posted a link to the slides on the wiki:

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528

On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

The list stripped my slides. Posted notes to the wiki, which doesn't
seem to allow attachments so not sure where to put slides.

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528


On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

The list stripped my slides. Posted notes to the wiki, which doesn't
seem to allow attachments so not sure where to put slides.

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528


On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

I posted a link to the slides on the wiki:

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528

On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

The list stripped my slides. Posted notes to the wiki, which doesn't
seem to allow attachments so not sure where to put slides.

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528


On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

I posted a link to the slides on the wiki:

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528

On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

Slides attached.  Thanks for taking notes Chris!


On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Deepak Sharma <de...@gmail.com>.

Hi All,
I am from India and involved with Hadoop for last 1-2 month.
I am planning to start Hadoop tutorial in India and would need help here.
Please let me know some really good tutorials on Hadoop and also if you all
can suggest what all can be included in the course content , such that this
becomes a job oriented course.

Looking forward to your reply.

Thanks,
Deepak

On Sat, May 29, 2010 at 6:07 AM, Chris Douglas <cd...@apache.org> wrote:

> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal)
> Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other
> subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in
> direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>



-- 
Deepak Sharma
http://www.linkedin.com/in/rikindia

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

Slides attached.  Thanks for taking notes Chris!


On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Deepak Sharma <de...@gmail.com>.

Hi All,
I am from India and involved with Hadoop for last 1-2 month.
I am planning to start Hadoop tutorial in India and would need help here.
Please let me know some really good tutorials on Hadoop and also if you all
can suggest what all can be included in the course content , such that this
becomes a job oriented course.

Looking forward to your reply.

Thanks,
Deepak

On Sat, May 29, 2010 at 6:07 AM, Chris Douglas <cd...@apache.org> wrote:

> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal)
> Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other
> subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in
> direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>



-- 
Deepak Sharma
http://www.linkedin.com/in/rikindia

Re: Contributor Meeting Minutes 05/28/2010

Posted by Eli Collins <el...@cloudera.com>.

Slides attached.  Thanks for taking notes Chris!


On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>

Re: Contributor Meeting Minutes 05/28/2010

Posted by Deepak Sharma <de...@gmail.com>.

Hi All,
I am from India and involved with Hadoop for last 1-2 month.
I am planning to start Hadoop tutorial in India and would need help here.
Please let me know some really good tutorials on Hadoop and also if you all
can suggest what all can be included in the course content , such that this
becomes a job oriented course.

Looking forward to your reply.

Thanks,
Deepak

On Sat, May 29, 2010 at 6:07 AM, Chris Douglas <cd...@apache.org> wrote:

> This month, the MapReduce + HDFS contributor meeting was held at
> Cloudera Headquarters.
>
> Announcements for contributor meetings are here:
> http://www.meetup.com/Hadoop-Contributors/
>
> Minutes follow. No decisions were made at this meeting, but the
> following issues were discussed and may presage future discussion and
> decisions on these lists.
>
> Eli, I think you have all the slides. Would you mind sending them out? -C
>
> == 0.21 release update ==
> * Continuing to close blockers, ping people for updates and suggestions
> * About 20 open blockers. Many are MapReduce documentation that may be
> pushed. Speak up if 0.21 is missing anything substantive.
> * Common/HDFS visibility and annotations are close to consensus;
> MapReduce annotations are committed to trunk and the 0.21 branch
>
> == HEP proposal ==
> (what follows is the sketch presented at the meeting. A full proposal
> with concrete details will be circulated on the list)
>
> * Based on- and very similar to- the PEP (Python Enhancement Proposal)
> Process
> * Audience is HDFS and MapReduce; not necessarily adopted by other
> subprojects
>  - Addresses the perception that there is friction between
> innovation/experimentation and stability
> * Not for small enhancements, features, and bug fixes. This should not
> slow down typical development or impede casual contribution to Hadoop
> * Primary mechanism for new features, collecting input, documenting
> design decisions
> * JIRA is good for details, but not for deciding on wide shifts in
> direction
> * Purpose is for author to build consensus and gather dissenting opinions.
>  - All may comment, but Editors will review incoming HEP material
>  - Editors determine only whether the HEP is complete, not whether
> they believe it is a sound idea
>  - Editors are appointed by the PMC
>  - Mechanism for appointing Editors and term of service TBD
>    - Apache Board appoints Shepherds for projects somewhat randomly,
> to projects. A similar mechanism could work for incoming HEPs
>  - Proposal *may* come with code, but not necessarily.
> Drafting/baking of the HEP occurs in public on a list dedicated to
> that particular proposal. Once Editors certify the HEP as complete, it
> is sent to general@ for wider discussion.
>    - The discussion phase begins on general@. The mailing list exists
> to ensure the HEP is complete enough to present to the community.
>  - Some discussion on the difference between posting to general@ and
> posting to the HEP list. Completeness is, of course, subjective. If
> the Editor and Author disagree whether the proposal affects an aspect
> of the framework enough to merit special consideration, it is not
> entirely clear how to resolve the disagreement.
>    - In general, the role of the Editor in the community-driven
> process of Hadoop is not entirely clear. It may be possible to
> optimize it out.
>  - Once discussion ends, the HEP is passed (or fails to pass) by a
> vote of the PMC (mechanics undefined). In Python, the result is
> committed to the repository. A similar practice would make sense in
> Hadoop.
> * Which issues require HEPs?
>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
> al. were examples of features substantial enough to merit a HEP. Pure
> Java CRC is an example of an enhancement that would not. Whether an
> explicit process must be in place to determine whether an issue
> requires a HEP is not clear.
>  - Viewing HEPs as a way of soliciting consensus for an approach
> might be more accurate. Going through the HEP process should always
> improve the chances of a successful proposal
>
> * Evaluation
>  - The proposal may be rejected if it is redundant with existing
> functionality, technically unsound, insufficiently motivated, no
> backwards compatibility story, etc.
>  - Implementation is not necessary, and is lightly discouraged.
> Feedback is less welcome once code is in hand.
>  - Purpose is to be clear about the acceptance criteria for that
> issue, e.g. concerns that the proposal may not scale or may harm
> performance
>  - Dissenting opinions must be recorded accurately. Quoting would be
> a safe practice for the Author to encourage HEP reviewers not to block
> the product of the proposal.
>
> * The testing burden and completion strategy may be ambiguous
>  - Whether the proposal affects scalability may not be testable by
> the implementer. Completing the proposal to address all use cases may
> require considerably more work than the Author is willing or motivated
> to invest.
>  - The HEP discussion on general@ should explore whether such
> objections are merited and reasonable. For example, a particularly
> obscure/esoteric use case could be included as a condition for
> acceptance if the dissenter is willing to invest the resources to
> test/validate it. The process is flexible in this regard.
>    - But it is not infinitely flexible. Backwards compatibility,
> performance regression, availability, and other considerations need
> not be called out in every HEP.
>    - Traditional concerns need to be documented. Acceptance criteria
> should ideally be automated and reproducible in different
> organizations
>
> == Branching ==
> * A patch and a branch are isomorphic from a policy perspective. Of
> course, they are functionally distinct: branches are easier to
> collaborate on and are, generally, longer-lived than are patches. But
> special policies need not be derived to account for these differences,
> which concern the production of the code, not its review and
> acceptance.
> * Some developers find branches to be easier to review than very large
> patches and easier to merge, given a toolchain that supports this.
>  - Subversion currently is difficult to adapt to this model
>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
> * Eclipse Labs
>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>  - Not official Eclipse projects, but associated with Eclipse
>  - Apache/Hadoop may consider a similar strategy
>  - Distinct from Apache Labs, as one need not be a committer, follow
> its rules for releases, etc.
>
> == Contrib ==
> * Modules (such as fuse-dfs) are not actively maintained in the main
> repository and would benefit from a release schedule decoupled from
> the rest of Hadoop
> * With few exceptions, the contrib modules have smaller, often
> discrete groups of maintainers. It may be worth exploring whether
> these projects could live elsewhere
>



-- 
Deepak Sharma
http://www.linkedin.com/in/rikindia