You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2010/07/01 19:08:29 UTC

Revisions to Incubator proposal

Greets,

We're on schedule.  We have a draft document, and it's looking good!

Even better, the exercise of drawing up this document has produced exactly the
benefits we hoped it would.  We have deepened our understanding of Apache
values, traditions, rules, and infrastructure.  We have refined our plan for
Lucy's growth.  

There is a possibility that this proposal won't be accepted.  But if that
happens, we have a contingency plan: leave Apache, yet continue to operate
according to many of the same procedures and values, possibly returning with a
new proposal at some later date.  We should seek to exploit any critiques that
we receive during the proposal process to help guide us, regardless of the
outcome -- just as we have attempted to make best use of the Lucene PMC's advice.

Having slept on the matter, here are some thoughts on the current proposal.

The first half of the Rationale seems fine...

    There is great hunger for a search engine library in the mode of Lucene
    which is accessible from various dynamic languages, and for one accessible
    from pure C. Individuals naturally wish to code in their language of
    choice. Organizations which do not have significant Java expertise may not
    want to support Java strictly for the sake of running a Lucene instance.
    Native applications can be launched much more quickly than JVMs. Lucy will
    meet all these demands. 

... but I'm dissatisfied with the second half:

    We acknowledge that Apache seems like a natural home for Lucy given that
    it is also the home of Lucene, and speculate that this may have been on
    the minds of the Lucene PMC when Lucy was green-lighted as a sub-project.
    More importantly, though, the Lucy development community strongly believes
    that The Apache Way is right for Lucy. 

First, this passage only asserts that we believe in The Apache Way rather than
demonstrating our understanding of it.  We should "show, not tell".  Second,
we should purge the PMC mind reading.  Who knows what they were thinking! :)
Third, I don't want to leave in any mention of Lucy belonging at Apache because
Lucene is there, too.  That's Lucy sponging off the Lucene brand, and it's not
a benefit to Apache.  We should just leave that unstated and stand on our
merits.

Here's the directive for Rationale from the proposal template:

    Explains why this project needs to exist and why should it be adopted by
    Apache. This is the right place for discursive material. 

We've covered "why this project needs to exist", but not "why it should be
adopted by Apache".  IMO the rest of the proposal does a fine job of
illustrating that point, but we need to do a better job here.  We should
examine the Rationale sections from other proposals to get a better sense of
how that might be best expressed.

The Community section does a fine job of identifying our challenges and
presenting a plan:

    Lucy currently has a small community, most members of which originated in the
    KinoSearch community.

    Lucy's chief challenge is growing its community, which it hopes to achieve
    through efforts in two areas: reaching a 1.0 release, and actively reaching
    out to its target audience, users and developers in the dynamic language
    communities who want a fast, scalable full-text search solution in their
    native language. 

Still, I think we deserve a little more credit.  We've taken a lot of flak
regarding the size of the Lucy community, but you know, if you consider how
the *KinoSearch* community has operated over the years, we haven't done so
bad.  Looking back at how Nate and I hashed out mmap support and the current
Query hierarchy, or how Chris Nandor and I collaborated on range queries, or
how Peter and I have intensified our collaboration over the last year, I see
healthy processes of negotiation and consensus building.

The one thing I don't think we've done well (and this is my fault) is handle
releases and backwards compatibility.  Father Chrysostomos put in an awful lot
of work creating a subclassable Highlighter, but later releases of KS broke
back compat on him.  Nevertheless, I think we have learned from how that
played out, and that the backwards compatibility policy we arrived at last
year goes a long way towards solving those problems.  

So, I'd like to take a look at what other projects have put in the Community
section, and maybe mod it to convey a more boffo outlook.

Another section that could stand some revision is External Dependencies.  IMO,
it provides too much detail.  We should just state that we require some CPAN
modules, but we have plans in place to eliminate them.

Lastly, it would be nice to cover our contingency plan of growing the
community and coming back with a bigger committer list at some later date.
However, I think that may arise naturally during the discussion, and it's
probably too big a topic to squeeze in.

Marvin Humphrey


Re: Revisions to Incubator proposal

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Jul 01, 2010 at 11:44:46PM -0700, Nathan Kurz wrote:
> >    There is great hunger for a search engine library in the mode of Lucene
> >    which is accessible from various dynamic languages, and for one accessible
> >    from pure C. Individuals naturally wish to code in their language of
> >    choice. Organizations which do not have significant Java expertise may not
> >    want to support Java strictly for the sake of running a Lucene instance.
> >    Native applications can be launched much more quickly than JVMs. Lucy will
> >    meet all these demands.
> 
> I like this, but might stop after the first two sentences.  I think
> they're the stronger reasons.  If you wanted to go further, I'd
> mention the ways in which C has traditionally been the language of
> Unix server programming, and allows one to take full advantage of the
> machine's potential.

OK, I went with this for today:

  Developers may want to take advantage of C's interoperability and
  fine-grained control. 

I do think it's important to at least touch on the advantages of natively
compiled code in the Rationale, and not limit ourselves to the
language-loyalty angle.

> While Lucy certainly can take advantage of the Apache infrastructure, I'm
> not sure it really needs it.  

Acknowledged.  However, the more successful Lucy gets, the more useful it is
to have the support systems and institutions of Apache protecting it and
instilling confidence.  And it's not just about backups, uptime, and fending
off crackers -- it's things like stable governance, legal protection
mechanisms, encouraging contributions from developers sponsored by major
corporations, and so on.

> What it needs is more people writing code for it, and the best way to
> achieve this is probably to get more technically proficient users.

I agree.  That is our most pressing need, so we should prioritize accordingly.

> I think you're doing fine on backwards compatibility, and if anything
> you're spending too much time on it.  Instead of worrying about not
> breaking things that are old, spend the effort making it easier to
> write things that are new.

I believe that the new backwards compatibility policy properly privileges
active developers, while still offering strong stability guarantees to those
who require them.  It's not perfect, but I think it strikes a good balance.  

> Let's make the dogfood tasty, and start writing 'extensions' instead
> of trying to include everything in core.  

A public C API will facilitate extensions.  Had one been available,
ProximityQuery would have been written to be a separate CPAN distro.

I think separating ProximityQuery from core would be a good concrete goal to
help us focus while we design the C API.

> I don't know if you need to discuss this.  And once we have more
> developers, do we really need to come back?   I mean, I really like
> Apache, but Trac plus taking the best points of the Incubator approach
> might offer 90% of the benefit with a lot less overhead.  Which
> doesn't mean Lucy wouldn't benefit from being included, but I wouldn't
> precommit to returning at a later date if they don't want us.

Your response helps to clarify the point that I really wanted to get in there,
which is that a long overdue release is imminent.  That's now covered in the
revised Community section.

OK, I only made that one minor content change, so I think we have adequate
consensus.  I'm going to perform a QA pass and maybe tweak some phrasings so
that the word "community" doesn't appear eleventy-billion times.  Then I'll
send it off to general@lucene.

Thanks, folks -- this has worked out well!

Marvin Humphrey


Re: Revisions to Incubator proposal

Posted by Nathan Kurz <na...@verse.com>.
On Thu, Jul 1, 2010 at 10:08 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> There is a possibility that this proposal won't be accepted.  But if that
> happens, we have a contingency plan: leave Apache, yet continue to operate
> according to many of the same procedures and values, possibly returning with a
> new proposal at some later date.  We should seek to exploit any critiques that
> we receive during the proposal process to help guide us, regardless of the
> outcome -- just as we have attempted to make best use of the Lucene PMC's advice.
>
> Having slept on the matter, here are some thoughts on the current proposal.
>
> The first half of the Rationale seems fine...
>
>    There is great hunger for a search engine library in the mode of Lucene
>    which is accessible from various dynamic languages, and for one accessible
>    from pure C. Individuals naturally wish to code in their language of
>    choice. Organizations which do not have significant Java expertise may not
>    want to support Java strictly for the sake of running a Lucene instance.
>    Native applications can be launched much more quickly than JVMs. Lucy will
>    meet all these demands.

I like this, but might stop after the first two sentences.  I think
they're the stronger reasons.  If you wanted to go further, I'd
mention the ways in which C has traditionally been the language of
Unix server programming, and allows one to take full advantage of the
machine's potential.

I think this is essentially what made Apache a great web server:  a
brilliant architecture combined with the clarity of C.  Can you
picture writing mod_php, mod_perl, or mod_python in Java?  It's
certainly not the only way to go, but fills a niche that Lucene never
will.

>    We acknowledge that Apache seems like a natural home for Lucy given that
>    it is also the home of Lucene, and speculate that this may have been on
>    the minds of the Lucene PMC when Lucy was green-lighted as a sub-project.
>    More importantly, though, the Lucy development community strongly believes
>    that The Apache Way is right for Lucy.

I agree that this part is somewhat weak. And while I really like
Apache, I'm not sure it's the only way for Lucy.  For example, I also
really like the SQLite way.  While Lucy certainly can take advantage
of the Apache infrastructure, I'm not sure it really needs it.  What
it needs is more people writing code for it, and the best way to
achieve this is probably to get more technically proficient users.
If the code is clear (which it is) and the architecture is graspable
(needs improvement?) some significant percentage of these users will
contribute.

> The one thing I don't think we've done well (and this is my fault) is handle
> releases and backwards compatibility.  Father Chrysostomos put in an awful lot
> of work creating a subclassable Highlighter, but later releases of KS broke
> back compat on him.  Nevertheless, I think we have learned from how that
> played out, and that the backwards compatibility policy we arrived at last
> year goes a long way towards solving those problems.

I think you're doing fine on backwards compatibility, and if anything
you're spending too much time on it.  Instead of worrying about not
breaking things that are old, spend the effort making it easier to
write things that are new.

Let's make the dogfood tasty, and start writing 'extensions' instead
of trying to include everything in core.  If it's easy enough and
useful enough, someone will port the old to the new.  In fact, that
would be an excellent project for a beginner to familiarize themselves
with the code base.

> Lastly, it would be nice to cover our contingency plan of growing the
> community and coming back with a bigger committer list at some later date.
> However, I think that may arise naturally during the discussion, and it's
> probably too big a topic to squeeze in.

I don't know if you need to discuss this.  And once we have more
developers, do we really need to come back?   I mean, I really like
Apache, but Trac plus taking the best points of the Incubator approach
might offer 90% of the benefit with a lot less overhead.  Which
doesn't mean Lucy wouldn't benefit from being included, but I wouldn't
precommit to returning at a later date if they don't want us.

--nate

Re: [Lucy] Revisions to Incubator proposal

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 7/1/10 12:08 PM:

> ... but I'm dissatisfied with the second half:
> 
>     We acknowledge that Apache seems like a natural home for Lucy given that
>     it is also the home of Lucene, and speculate that this may have been on
>     the minds of the Lucene PMC when Lucy was green-lighted as a sub-project.
>     More importantly, though, the Lucy development community strongly believes
>     that The Apache Way is right for Lucy. 
> 
> First, this passage only asserts that we believe in The Apache Way rather than
> demonstrating our understanding of it.  We should "show, not tell".  Second,
> we should purge the PMC mind reading.  Who knows what they were thinking! :)
> Third, I don't want to leave in any mention of Lucy belonging at Apache because
> Lucene is there, too.  That's Lucy sponging off the Lucene brand, and it's not
> a benefit to Apache.  We should just leave that unstated and stand on our
> merits.

+1 to all your points. Nuke that second half.


> The Community section does a fine job of identifying our challenges and
> presenting a plan:
> 
>     Lucy currently has a small community, most members of which originated in the
>     KinoSearch community.
> 
>     Lucy's chief challenge is growing its community, which it hopes to achieve
>     through efforts in two areas: reaching a 1.0 release, and actively reaching
>     out to its target audience, users and developers in the dynamic language
>     communities who want a fast, scalable full-text search solution in their
>     native language. 
> 
> Still, I think we deserve a little more credit.  We've taken a lot of flak
> regarding the size of the Lucy community, but you know, if you consider how
> the *KinoSearch* community has operated over the years, we haven't done so
> bad.

+1 for mentioning KS, since *that* is the code that is being donated.

> The one thing I don't think we've done well (and this is my fault) is handle
> releases and backwards compatibility.

I agree with what Nate said on this. The perfect is the enemy of the good.


> Lastly, it would be nice to cover our contingency plan of growing the
> community and coming back with a bigger committer list at some later date.
> However, I think that may arise naturally during the discussion, and it's
> probably too big a topic to squeeze in.
> 

+1 on Nate's comments here.

I'm afk most of the day today (Friday), Marvin. I'm fine with the proposal as it
written at the moment; it looks like you've already addressed most of the points
above in your edits from last night.

cheers on a hard week's work!

pek


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com