You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Eli Collins <el...@cloudera.com> on 2010/06/01 21:25:07 UTC

Re: Contributor Meeting Minutes 05/28/2010

I posted a link to the slides on the wiki:

http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100528

On Fri, May 28, 2010 at 7:59 PM, Eli Collins <el...@cloudera.com> wrote:
> Slides attached.  Thanks for taking notes Chris!
>
>
> On Fri, May 28, 2010 at 5:37 PM, Chris Douglas <cd...@apache.org> wrote:
>> This month, the MapReduce + HDFS contributor meeting was held at
>> Cloudera Headquarters.
>>
>> Announcements for contributor meetings are here:
>> http://www.meetup.com/Hadoop-Contributors/
>>
>> Minutes follow. No decisions were made at this meeting, but the
>> following issues were discussed and may presage future discussion and
>> decisions on these lists.
>>
>> Eli, I think you have all the slides. Would you mind sending them out? -C
>>
>> == 0.21 release update ==
>> * Continuing to close blockers, ping people for updates and suggestions
>> * About 20 open blockers. Many are MapReduce documentation that may be
>> pushed. Speak up if 0.21 is missing anything substantive.
>> * Common/HDFS visibility and annotations are close to consensus;
>> MapReduce annotations are committed to trunk and the 0.21 branch
>>
>> == HEP proposal ==
>> (what follows is the sketch presented at the meeting. A full proposal
>> with concrete details will be circulated on the list)
>>
>> * Based on- and very similar to- the PEP (Python Enhancement Proposal) Process
>> * Audience is HDFS and MapReduce; not necessarily adopted by other subprojects
>>  - Addresses the perception that there is friction between
>> innovation/experimentation and stability
>> * Not for small enhancements, features, and bug fixes. This should not
>> slow down typical development or impede casual contribution to Hadoop
>> * Primary mechanism for new features, collecting input, documenting
>> design decisions
>> * JIRA is good for details, but not for deciding on wide shifts in direction
>> * Purpose is for author to build consensus and gather dissenting opinions.
>>  - All may comment, but Editors will review incoming HEP material
>>  - Editors determine only whether the HEP is complete, not whether
>> they believe it is a sound idea
>>  - Editors are appointed by the PMC
>>  - Mechanism for appointing Editors and term of service TBD
>>    - Apache Board appoints Shepherds for projects somewhat randomly,
>> to projects. A similar mechanism could work for incoming HEPs
>>  - Proposal *may* come with code, but not necessarily.
>> Drafting/baking of the HEP occurs in public on a list dedicated to
>> that particular proposal. Once Editors certify the HEP as complete, it
>> is sent to general@ for wider discussion.
>>    - The discussion phase begins on general@. The mailing list exists
>> to ensure the HEP is complete enough to present to the community.
>>  - Some discussion on the difference between posting to general@ and
>> posting to the HEP list. Completeness is, of course, subjective. If
>> the Editor and Author disagree whether the proposal affects an aspect
>> of the framework enough to merit special consideration, it is not
>> entirely clear how to resolve the disagreement.
>>    - In general, the role of the Editor in the community-driven
>> process of Hadoop is not entirely clear. It may be possible to
>> optimize it out.
>>  - Once discussion ends, the HEP is passed (or fails to pass) by a
>> vote of the PMC (mechanics undefined). In Python, the result is
>> committed to the repository. A similar practice would make sense in
>> Hadoop.
>> * Which issues require HEPs?
>>  - Discussion ranged. Append, backup namenode, edit log rewrite, et
>> al. were examples of features substantial enough to merit a HEP. Pure
>> Java CRC is an example of an enhancement that would not. Whether an
>> explicit process must be in place to determine whether an issue
>> requires a HEP is not clear.
>>  - Viewing HEPs as a way of soliciting consensus for an approach
>> might be more accurate. Going through the HEP process should always
>> improve the chances of a successful proposal
>>
>> * Evaluation
>>  - The proposal may be rejected if it is redundant with existing
>> functionality, technically unsound, insufficiently motivated, no
>> backwards compatibility story, etc.
>>  - Implementation is not necessary, and is lightly discouraged.
>> Feedback is less welcome once code is in hand.
>>  - Purpose is to be clear about the acceptance criteria for that
>> issue, e.g. concerns that the proposal may not scale or may harm
>> performance
>>  - Dissenting opinions must be recorded accurately. Quoting would be
>> a safe practice for the Author to encourage HEP reviewers not to block
>> the product of the proposal.
>>
>> * The testing burden and completion strategy may be ambiguous
>>  - Whether the proposal affects scalability may not be testable by
>> the implementer. Completing the proposal to address all use cases may
>> require considerably more work than the Author is willing or motivated
>> to invest.
>>  - The HEP discussion on general@ should explore whether such
>> objections are merited and reasonable. For example, a particularly
>> obscure/esoteric use case could be included as a condition for
>> acceptance if the dissenter is willing to invest the resources to
>> test/validate it. The process is flexible in this regard.
>>    - But it is not infinitely flexible. Backwards compatibility,
>> performance regression, availability, and other considerations need
>> not be called out in every HEP.
>>    - Traditional concerns need to be documented. Acceptance criteria
>> should ideally be automated and reproducible in different
>> organizations
>>
>> == Branching ==
>> * A patch and a branch are isomorphic from a policy perspective. Of
>> course, they are functionally distinct: branches are easier to
>> collaborate on and are, generally, longer-lived than are patches. But
>> special policies need not be derived to account for these differences,
>> which concern the production of the code, not its review and
>> acceptance.
>> * Some developers find branches to be easier to review than very large
>> patches and easier to merge, given a toolchain that supports this.
>>  - Subversion currently is difficult to adapt to this model
>>  - Could be done on a HEP-by-HEP basis, as a condition for acceptance
>> * Eclipse Labs
>>  - Branded version of Google Code (same functionality, w/ Eclipse brand)
>>  - Not official Eclipse projects, but associated with Eclipse
>>  - Apache/Hadoop may consider a similar strategy
>>  - Distinct from Apache Labs, as one need not be a committer, follow
>> its rules for releases, etc.
>>
>> == Contrib ==
>> * Modules (such as fuse-dfs) are not actively maintained in the main
>> repository and would benefit from a release schedule decoupled from
>> the rest of Hadoop
>> * With few exceptions, the contrib modules have smaller, often
>> discrete groups of maintainers. It may be worth exploring whether
>> these projects could live elsewhere
>>
>