You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/03/14 17:28:46 UTC

Less drastic ways

Hi,

Consider this just an email to clarify things for Otis (and maybe a few other people).

Are the following the main goals of the recent merge voting thread(s)?
* Make it easier for Solr to ride the Lucene trunk
* Make it easier for people to avoid committing new features to Solr when they really belong to some lower level code - either Lucene core or some Lucene module

Is the only or main change being proposed that lucene-dev and solr-dev mode to some common-dev (or lucene-dev)?

If the above is correct, here is what I don't understand:
* Why can't Solr riding on Lucene trunk be achieved by getting Lucene trunk build into Solr lib in svn on a daily/hourly basis?
* Why can't existing Solr functionality that has been identified as "should really have been committed to Lucene instead of Solr" be moved to Lucene over the coming months?
* Why can't Solr developers be required to be subscribed to lucene-dev?
* Why can't Solr developers be required/urged to commit any new functionality to Lucene if solr-dev and lucene-dev people think that's where it belongs? i.e. communicate before committing - the same as "measure twice, cut once".

Thanks,
Otis


Re: Less drastic ways

Posted by Chris Hostetter <ho...@fucit.org>.
: Or, two committerships, two projects.  So, if the existing committer structure
: were acceptable, then Solr would be split to a separate TLP. That is the more
: common direction, for things to be split into separate TLPs as they grow.
: Merging is an unusual experiment.

In trying to look at the situation impartially, i think what really makes 
the Lucene-Java/Solr situation unique (in at least as much as i understand 
how other TLPs have formed in the past) is that while the user communities 
of the two "products" have diverged (or any many ways, never really been 
aligned) the developer communities and feature sets have in many ways 
converged over time.  Lucene beget Hadoop because the Hadoop 
users/developers diverged from the Lucene users/developers and the Hadoop 
features diverged from the Lucene (ie: Nutch) features and use cases.  

As i understand it, something similar happened with HTTPD/APR: APR's 
developer community (and certainly it's user communities) started having 
less and less overlap with people who were focused on building a 
webserver.

In our case, more and more Lucene-Java developers started wanting to work 
on the guts of Solr, and more and more developers as a whole wanted to see 
better melding/refactoring of functionality across Solr/Lucene-Java -- 
In terms of "getting things done" creating a new TLP wouldn't have really 
helped that situation, it would have just simplified the politics.

Regardless of how ugly it was the watch the sausage getting made (and 
regardless of my personal frustration with the voting insanity) i think 
that in the short term, and medium term, the decisison to "tear down the 
walls" dividing the two developer communities will certainly help improve 
both products and eliminate duplication of efforst ... even if in the long 
term it is ultimately decided that it Solr should be it's own TLP.



-Hoss


Re: Less drastic ways

Posted by Doug Cutting <cu...@apache.org>.
Grant Ingersoll wrote:
> As
> you've seen by the Board's indication, they only view that there
> should be a single Lucene project.  One committership, one project.

Or, two committerships, two projects.  So, if the existing committer 
structure were acceptable, then Solr would be split to a separate TLP. 
That is the more common direction, for things to be split into separate 
TLPs as they grow.  Merging is an unusual experiment.

Doug

Re: Less drastic ways

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Grant,
 
> Yeah, I agree this would be a good thing and would make sense in the near
> future.  I think we should iron out the TLP stuff first and we should let
> Nutch discuss what it's plans for modularization are.  But yeah, I'm
> definitely open to discussing it.
> 
> I think we sometimes also forget about the fact that we can always SVN revert
> things, too, if someone makes a commit that doesn't work out.
> 
> -Grant 

Cool, sounds good. Back to eating my banana pancakes!

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: Less drastic ways

Posted by Grant Ingersoll <gs...@apache.org>.
On Mar 14, 2010, at 12:51 PM, Mattmann, Chris A (388J) wrote:

> Hey Grant,
> 
>> Also, I really don't see what is so drastic about the proposal.  All we're
>> doing is making it easier for code to be put in the right place.  We're not
>> having Lucene consumed by Solr nor vice versa.  As you've seen by the Board's
>> indication, they only view that there should be a single Lucene project.  One
>> committership, one project.
> 
> Along those lines, rather than just give the solr-devs karma to lucene and
> vice versa with the lucene-devs, might it make sense to start discussing
> giving all PMC members and all committers on sub projects (perhaps modulo
> Tika and modulo Mahout since they are on the way into TLP-ville it seems)
> karma to a single Lucene code base that we are now moving towards?
> 
> This might make sense since folks who've earned committer rights know what
> to touch and what not to, and also it might pool together resources of
> people that all are contributing towards the overall Lucene TLP.

Yeah, I agree this would be a good thing and would make sense in the near future.  I think we should iron out the TLP stuff first and we should let Nutch discuss what it's plans for modularization are.  But yeah, I'm definitely open to discussing it.   

I think we sometimes also forget about the fact that we can always SVN revert things, too, if someone makes a commit that doesn't work out.

-Grant

Re: Less drastic ways

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Grant,

> Also, I really don't see what is so drastic about the proposal.  All we're
> doing is making it easier for code to be put in the right place.  We're not
> having Lucene consumed by Solr nor vice versa.  As you've seen by the Board's
> indication, they only view that there should be a single Lucene project.  One
> committership, one project.

Along those lines, rather than just give the solr-devs karma to lucene and
vice versa with the lucene-devs, might it make sense to start discussing
giving all PMC members and all committers on sub projects (perhaps modulo
Tika and modulo Mahout since they are on the way into TLP-ville it seems)
karma to a single Lucene code base that we are now moving towards?

This might make sense since folks who've earned committer rights know what
to touch and what not to, and also it might pool together resources of
people that all are contributing towards the overall Lucene TLP.

Cheers,
Chris




> 
> 
> On Mar 14, 2010, at 12:28 PM, Otis Gospodnetic wrote:
> 
>> Hi,
>> 
>> Consider this just an email to clarify things for Otis (and maybe a few other
>> people).
>> 
>> Are the following the main goals of the recent merge voting thread(s)?
>> * Make it easier for Solr to ride the Lucene trunk
>> * Make it easier for people to avoid committing new features to Solr when
>> they really belong to some lower level code - either Lucene core or some
>> Lucene module
>> 
>> Is the only or main change being proposed that lucene-dev and solr-dev mode
>> to some common-dev (or lucene-dev)?
>> 
>> If the above is correct, here is what I don't understand:
>> * Why can't Solr riding on Lucene trunk be achieved by getting Lucene trunk
>> build into Solr lib in svn on a daily/hourly basis?
>> * Why can't existing Solr functionality that has been identified as "should
>> really have been committed to Lucene instead of Solr" be moved to Lucene over
>> the coming months?
>> * Why can't Solr developers be required to be subscribed to lucene-dev?
>> * Why can't Solr developers be required/urged to commit any new functionality
>> to Lucene if solr-dev and lucene-dev people think that's where it belongs?
>> i.e. communicate before committing - the same as "measure twice, cut once".
>> 
>> Thanks,
>> Otis
>> 
> 
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: Less drastic ways

Posted by Grant Ingersoll <gs...@apache.org>.
Also, I really don't see what is so drastic about the proposal.  All we're doing is making it easier for code to be put in the right place.  We're not having Lucene consumed by Solr nor vice versa.  As you've seen by the Board's indication, they only view that there should be a single Lucene project.  One committership, one project.


On Mar 14, 2010, at 12:28 PM, Otis Gospodnetic wrote:

> Hi,
> 
> Consider this just an email to clarify things for Otis (and maybe a few other people).
> 
> Are the following the main goals of the recent merge voting thread(s)?
> * Make it easier for Solr to ride the Lucene trunk
> * Make it easier for people to avoid committing new features to Solr when they really belong to some lower level code - either Lucene core or some Lucene module
> 
> Is the only or main change being proposed that lucene-dev and solr-dev mode to some common-dev (or lucene-dev)?
> 
> If the above is correct, here is what I don't understand:
> * Why can't Solr riding on Lucene trunk be achieved by getting Lucene trunk build into Solr lib in svn on a daily/hourly basis?
> * Why can't existing Solr functionality that has been identified as "should really have been committed to Lucene instead of Solr" be moved to Lucene over the coming months?
> * Why can't Solr developers be required to be subscribed to lucene-dev?
> * Why can't Solr developers be required/urged to commit any new functionality to Lucene if solr-dev and lucene-dev people think that's where it belongs? i.e. communicate before committing - the same as "measure twice, cut once".
> 
> Thanks,
> Otis
> 



Re: Less drastic ways

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sun, Mar 14, 2010 at 4:29 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:

> Even if we merge Lucene/Solr and we treat Solr as just another
> Lucene contrib/module, say, contributors who care only about Solr
> will still patch against Solr and Lucene developers or those people
> who have the itch for that functionality being in Lucene, too, will
> still have to poach/refactor and pull that functionality in Lucene
> later on.

Yes, people with their respective itches can still create Solr-only
and Lucene-only functions, after the merge.  We should not block any
feature from going in solely because it's not "factored" so that both
Lucene & Solr can use it.

But, no, poaching is no longer needed with merged dev -- we are free
to efficiently refactor at that point.  Merged, we don't need to have
full copies of the code in two projects, await releases to de-dup,
etc. -- code can just freely move back and forth within the project.
It's also more likely that someone wearing a Lucene hat will see the
Solr work going on and jump in and help to make it work in Lucene.

Merged dev makes refactoring much more efficient then poaching across
project lines.  Both achieve the same goals with time, it's just that
poaching is a much slower/more wasteful way to achieve it... (but of
course is the only option for disparate projects, eg, pulling stuff
from Nutch down into Lucene).

> Whether Solr is a separate project or a Lucene
> contrib/module that has its own user (and contributor) community
> that is not tightly integrated with Lucene's -dev community, the
> same thing will happen, no?

True, but much less efficiently (if we can only poach across project
lines).

> Maybe it will help if we made things visual for us visual peeps.  Is
> this, roughly, what the plan is:
>
> trunk/
>    lucene-core/
>    modules/
>        analysis/
>        wordnet/
>        spellchecker/
>        whatever/
>        ...
>        facets/
>        ...
>        functions/
>        solr/
>            dih/
>            ...

I honestly don't know what module structure we'll come up with!  It's
tbd'd....

But this looks like a good start :)

I think we'd also have a queryparser module (we have like 7 of them,
according to Robert ;), a queries module (I'd think functions lives
inside there).

Mike

Re: Less drastic ways

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I don't get it, Mike. :)
Even if we merge Lucene/Solr and we treat Solr as just another Lucene contrib/module, say, contributors who care only about Solr will still patch against Solr and Lucene developers or those people who have the itch for that functionality being in Lucene, too, will still have to poach/refactor and pull that functionality in Lucene later on.  Whether Solr is a separate project or a Lucene contrib/module that has its own user (and contributor) community that is not tightly integrated with Lucene's -dev community, the same thing will happen, no?


Maybe it will help if we made things visual for us visual peeps.  Is this, roughly, what the plan is:

trunk/
    lucene-core/
    modules/
        analysis/
        wordnet/
        spellchecker/
        whatever/
        ...
        facets/
        ...
        functions/
        solr/
            dih/
            ...

?

Thanks,
Otis ----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: general@lucene.apache.org
> Sent: Sun, March 14, 2010 4:34:42 PM
> Subject: Re: Less drastic ways
> 
> > Hm, again I'm confused.  If this is how it worked in 
> Solr/Lucene
> land, then there wouldn't be pieces in Solr that we now want 
> to
> refactor and move into Lucene core or modules.  A list of about 
> 4-5
> such pieces of functionality in Solr has already been 
> listed.
> That's really my main question.  Why were/can't things be 
> committed
> to the appropriate place?  Why where they committed to 
> Solr?

Pre-merge:

If someone wants a new functionality in Solr they 
> should be free to
create a patch to make it work well, in Solr, 
> alone.

To expect them to also factor it so that it works well for 
> Lucene-only
users is wrong.  They should not need to, nor be expected 
> to, and they
shouldn't feel bad not having factored it that way.  They 
> use Solr and
they need it working in Solr and that was their itch and 
> they
scratched it and net/net that was a great step forward for Solr.  
> We
should not up and reject contributions because they are not 
> well
factored for the two projects.  Beggars can't be 
> choosers...

Someone who later has the itch for this functionality in 
> Lucene should
then be fully free to pick it up, refactor, and make it work in 
> Lucene
alone, by poaching it (pulling it into Lucene).

Poaching is a 
> natural way for code to be pulled across projects... and
while in the short 
> term it'd result in code dup, in the long term this
is how refactoring can 
> happen across projects.  It's completely normal
and fine, in my 
> opinion.

But poaching, while effective, is slow ... Lucene would poach, 
> have
to stabilize & do a release, Solr would have to upgrade and then 
> fix
to cutover to Lucene's sources (assuming the sources hadn't
diverged 
> too much, else Solr would have to wait for Lucene's next
release, 
> etc.)

And we have *alot* of modules to refactor here, between Solr 
> and
Lucene.

So for these two reasons I vote for merging Solr/Lucene 
> dev over gobbs
of poaching.  That gives us complete freedom to quickly 
> move the code
around.

Poaching should still be perfectly fine for 
> other cases, like pulling
analyzers from Nutch, from other projects, 
> etc.

Mike

Re: Less drastic ways

Posted by Michael McCandless <lu...@mikemccandless.com>.
> Hm, again I'm confused.  If this is how it worked in Solr/Lucene
> land, then there wouldn't be pieces in Solr that we now want to
> refactor and move into Lucene core or modules.  A list of about 4-5
> such pieces of functionality in Solr has already been listed.
> That's really my main question.  Why were/can't things be committed
> to the appropriate place?  Why where they committed to Solr?

Pre-merge:

If someone wants a new functionality in Solr they should be free to
create a patch to make it work well, in Solr, alone.

To expect them to also factor it so that it works well for Lucene-only
users is wrong.  They should not need to, nor be expected to, and they
shouldn't feel bad not having factored it that way.  They use Solr and
they need it working in Solr and that was their itch and they
scratched it and net/net that was a great step forward for Solr.  We
should not up and reject contributions because they are not well
factored for the two projects.  Beggars can't be choosers...

Someone who later has the itch for this functionality in Lucene should
then be fully free to pick it up, refactor, and make it work in Lucene
alone, by poaching it (pulling it into Lucene).

Poaching is a natural way for code to be pulled across projects... and
while in the short term it'd result in code dup, in the long term this
is how refactoring can happen across projects.  It's completely normal
and fine, in my opinion.

But poaching, while effective, is slow ... Lucene would poach, have
to stabilize & do a release, Solr would have to upgrade and then fix
to cutover to Lucene's sources (assuming the sources hadn't
diverged too much, else Solr would have to wait for Lucene's next
release, etc.)

And we have *alot* of modules to refactor here, between Solr and
Lucene.

So for these two reasons I vote for merging Solr/Lucene dev over gobbs
of poaching.  That gives us complete freedom to quickly move the code
around.

Poaching should still be perfectly fine for other cases, like pulling
analyzers from Nutch, from other projects, etc.

Mike

Re: Less drastic ways

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

----- Original Message ----
> From: Grant Ingersoll <gs...@apache.org>
> To: general@lucene.apache.org
> Sent: Sun, March 14, 2010 12:40:51 PM
> Subject: Re: Less drastic ways
> 
> 
On Mar 14, 2010, at 12:28 PM, Otis Gospodnetic wrote:

> 
> Hi,
> 
> Consider this just an email to clarify things for Otis (and 
> maybe a few other people).
> 
> Are the following the main goals of 
> the recent merge voting thread(s)?
> * Make it easier for Solr to ride the 
> Lucene trunk
> * Make it easier for people to avoid committing new 
> features to Solr when they really belong to some lower level code - either 
> Lucene core or some Lucene module
> 
> Is the only or main change 
> being proposed that lucene-dev and solr-dev mode to some common-dev (or 
> lucene-dev)?
> 
> If the above is correct, here is what I don't 
> understand:
> * Why can't Solr riding on Lucene trunk be achieved by 
> getting Lucene trunk build into Solr lib in svn on a daily/hourly 
> basis?

GI: I just don't see that working.


OG: Could you please elaborate?  Also, why not try it and see?  It requires very little infrastructure changes and no reorg.  Reorg can always be done later if this first step proves to be inadequate.

> * Why can't existing 
> Solr functionality that has been identified as "should really have been 
> committed to Lucene instead of Solr" be moved to Lucene over the coming 
> months?

GI: First up is analysis, I suspect.


OG: Si!

> * Why can't Solr 
> developers be required to be subscribed to lucene-dev?

They should.  
> That's the immediate step going forward until the various infra gyrations are 
> undertaken.

> * Why can't Solr developers be required/urged to commit 
> any new functionality to Lucene if solr-dev and lucene-dev people think that's 
> where it belongs? i.e. communicate before committing - the same as "measure 
> twice, cut once".

GI: Of course they will.  This is how committing works on any and all projects anyway.


OG: Hm, again I'm confused.  If this is how it worked in Solr/Lucene land, then there wouldn't be pieces in Solr that we now want to refactor and move into Lucene core or modules.  A list of about 4-5 such pieces of functionality in Solr has already been listed.  That's really my main question.  Why were/can't things be committed to the appropriate place?  Why where they committed to Solr?

Thanks,
Otis


Re: Less drastic ways

Posted by Grant Ingersoll <gs...@apache.org>.
On Mar 14, 2010, at 12:28 PM, Otis Gospodnetic wrote:

> Hi,
> 
> Consider this just an email to clarify things for Otis (and maybe a few other people).
> 
> Are the following the main goals of the recent merge voting thread(s)?
> * Make it easier for Solr to ride the Lucene trunk
> * Make it easier for people to avoid committing new features to Solr when they really belong to some lower level code - either Lucene core or some Lucene module
> 
> Is the only or main change being proposed that lucene-dev and solr-dev mode to some common-dev (or lucene-dev)?
> 
> If the above is correct, here is what I don't understand:
> * Why can't Solr riding on Lucene trunk be achieved by getting Lucene trunk build into Solr lib in svn on a daily/hourly basis?

I just don't see that working.

> * Why can't existing Solr functionality that has been identified as "should really have been committed to Lucene instead of Solr" be moved to Lucene over the coming months?

First up is analysis, I suspect.

> * Why can't Solr developers be required to be subscribed to lucene-dev?

They should.  That's the immediate step going forward until the various infra gyrations are undertaken.

> * Why can't Solr developers be required/urged to commit any new functionality to Lucene if solr-dev and lucene-dev people think that's where it belongs? i.e. communicate before committing - the same as "measure twice, cut once".

Of course they will.  This is how committing works on any and all projects anyway.