You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Robert Muir <rc...@gmail.com> on 2011/04/27 05:07:25 UTC

modularization discussion

Hi,

It appears there are some problems with modularization of the code,
especially between lucene and solr, so I would like for us to have a
discussion on this thread.

The two sides/takes seem to be (with some example reasons):
1. pro: for example, modularization can expose features that were
traditionally in solr to lucene users.
2. con: for example, modularization slows development of these
features and they will evolve slower if they are in lucene.

I think we need to somehow get a better understanding of both sides,
specific examples of portions of the code would be helpful I think.
Maybe then we can arrive at a compromise so that we aren't so
frustrated about this issue.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@gmail.com>.


On Apr 26, 2011, at 11:12 PM, Chris Male <ge...@gmail.com> wrote:

> > The two sides/takes seem to be (with some example reasons):
> > 1. pro: for example, modularization can expose features that were
> > traditionally in solr to lucene users.
> 
> Some other Pros:
> Easier to test individual pieces.  Easier to benchmark.
> More usage == more/better features/functionality for everyone.
> Easier for people to contribute to without having to know the full stack.
> I think most people agree that decoupled, reusable modules are a good thing in general as an abstract concept, but, of course, specifics matter.
> 
> > 2. con: for example, modularization slows development of these
> > features and they will evolve slower if they are in lucene.
> >
> 
> I think this needs a bit more explanation.  AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr.
> 
> I feel this can be flipped around and seen as a pro though too.  

Agreed. Wasnt sure where to put it. Some see it as bad, some as good

> Taking internal code and making it public can be beneficial for that code, because it forces the APIs to be examined, test coverage improved, and a general 'kicking of the tyres'.  With private internal APIs, there is always a temptation to make quick changes that meet an immediate need, rather than having to step back and take more time considering changes.  That can slow things down yes, but it definitely has its benefits.
>  
> 
> Other Cons:
> The concern was that Solr just becomes an uninteresting, empty shell that glues together modules. (I don't agree, but wanted to present what I have heard)
> 
> 
> 
> > I think we need to somehow get a better understanding of both sides,
> > specific examples of portions of the code would be helpful I think.
> > Maybe then we can arrive at a compromise so that we aren't so
> > frustrated about this issue.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> 
> -- 
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl

Re: modularization discussion

Posted by Chris Male <ge...@gmail.com>.

>
> > The two sides/takes seem to be (with some example reasons):
> > 1. pro: for example, modularization can expose features that were
> > traditionally in solr to lucene users.
>
> Some other Pros:
> Easier to test individual pieces.  Easier to benchmark.
> More usage == more/better features/functionality for everyone.
> Easier for people to contribute to without having to know the full stack.
> I think most people agree that decoupled, reusable modules are a good thing
> in general as an abstract concept, but, of course, specifics matter.
>
> > 2. con: for example, modularization slows development of these
> > features and they will evolve slower if they are in lucene.
> >
>
> I think this needs a bit more explanation.  AIUI, the primary cause for
> concern is that by making something a module, you are taking a private,
> internal API of Solr's and now making it a public API that must be
> maintained (and backwards maintained) which could slow down development as
> one now needs to be concerned with more factors than you would if it were
> merely an implementation detail in Solr.
>

I feel this can be flipped around and seen as a pro though too.  Taking
internal code and making it public can be beneficial for that code, because
it forces the APIs to be examined, test coverage improved, and a general
'kicking of the tyres'.  With private internal APIs, there is always a
temptation to make quick changes that meet an immediate need, rather than
having to step back and take more time considering changes.  That can slow
things down yes, but it definitely has its benefits.


>
> Other Cons:
> The concern was that Solr just becomes an uninteresting, empty shell that
> glues together modules. (I don't agree, but wanted to present what I have
> heard)
>
>
>
> > I think we need to somehow get a better understanding of both sides,
> > specific examples of portions of the code would be helpful I think.
> > Maybe then we can arrive at a compromise so that we aren't so
> > frustrated about this issue.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll <gs...@apache.org> wrote:

> I think this needs a bit more explanation.  AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr.

This concern doesn't make sense to me: if we mark a module
experimental, we are fully free to change it, even drastically.

Pre-merge, I agree, it was a nightmare factoring code across
projects... but now that we are merged, and now that we have
@experimental, I don't understand this argument.

Maybe we can take a concrete example, eg LUCENE-2995 (factored out
"suggest" module): how does this being its own module hurt Solr?

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Robert Muir <rc...@gmail.com>.

On Wed, Apr 27, 2011 at 8:13 AM, Mark Miller <ma...@gmail.com> wrote:

> The problem is that Simon says things like, everything should be a module and solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes comments questioning individual modules. That scares the other guys. Both sides retreat to their corners.
>

why? In the best interest of the project, what are the reasons why
this a bad thing? Then users could access solr's features from the
API.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: modularization discussion

Posted by Steven A Rowe <sa...@syr.edu>.

> if its not stated as "this feature is going to Lucene"

It seems as though some people assume that since Lucene is a library, and Solr is an application, that exposing Solr API *means* making it part of Lucene.  It ain't necessarily so, and it need not be a point of contention.

I want to reiterate my opinion (voiced pre-merge) that there be a third entity here besides Solr and Lucene.

E.g., if "modules/" became "thirdentity/", with its own org.apache.thirdentity namespace, wouldn't questions of ownership/control mostly go away?

Steve

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On Apr 27, 2011, at 12:14 AM, Robert Muir wrote:

> On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> I think this needs a bit more explanation.  AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr.
>> 
> 
> Can we solve this? It seems like for lucene users, they currently only
> have this choice:
> 
> A. no access to feature X at all
> 
> but, couldn't they at least have this choice:
> 
> A. no access to feature X at all
> B. having access to some feature, but it has relaxed backwards
> compatibility to address the concern.
> 
> In other words, we could mark the api @experimental or whatever, and
> the user can choose not to use it from a lucene level if they don't
> want to deal with upgrade hassles.

Honestly, too much fight too see the trees through the forrest.

Yonik has compromised down with pretty much every module brought up, that if its not stated as "this feature is going to Lucene", if it goes to a module, if the module can have similar recs as the code had in Solr - that he's okay with it. To him it's very important that some of this stuff comes off as shared between Lucene/Solr and not just Lucene's. That's what I have gathered anyway. Fine by me.

My memory is that Yonik has never been stead fast against modules. He has tried to negotiate what he thinks is best in terms of this stuff. 

The break down comes from the personalities involved. Noone has been willing to swim to the end because it's hard work. Well some things are hard work. I say get used to it. I am.

The problem is that Simon says things like, everything should be a module and solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes comments questioning individual modules. That scares the other guys. Both sides retreat to their corners.

Fantastic. Yes there is a middle ground - I've seen it swirl around and disappear back into the blood a few times. These volatile personalities are just not finding it.

- Mark

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Robert Muir <rc...@gmail.com>.

On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I think this needs a bit more explanation.  AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr.
>

Can we solve this? It seems like for lucene users, they currently only
have this choice:

A. no access to feature X at all

but, couldn't they at least have this choice:

A. no access to feature X at all
B. having access to some feature, but it has relaxed backwards
compatibility to address the concern.

In other words, we could mark the api @experimental or whatever, and
the user can choose not to use it from a lucene level if they don't
want to deal with upgrade hassles.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 26, 2011, at 10:07 PM, Robert Muir wrote:

> Hi,
> 
> It appears there are some problems with modularization of the code,
> especially between lucene and solr, so I would like for us to have a
> discussion on this thread.
> 

+1

> The two sides/takes seem to be (with some example reasons):
> 1. pro: for example, modularization can expose features that were
> traditionally in solr to lucene users.

Some other Pros:
Easier to test individual pieces.  Easier to benchmark.  
More usage == more/better features/functionality for everyone.  
Easier for people to contribute to without having to know the full stack.  
I think most people agree that decoupled, reusable modules are a good thing in general as an abstract concept, but, of course, specifics matter.

> 2. con: for example, modularization slows development of these
> features and they will evolve slower if they are in lucene.
> 

I think this needs a bit more explanation.  AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr.

Other Cons:
The concern was that Solr just becomes an uninteresting, empty shell that glues together modules. (I don't agree, but wanted to present what I have heard)

> I think we need to somehow get a better understanding of both sides,
> specific examples of portions of the code would be helpful I think.
> Maybe then we can arrive at a compromise so that we aren't so
> frustrated about this issue.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Simon Willnauer <si...@googlemail.com>.

On Sat, May 7, 2011 at 12:30 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> I agree: refactoring is TONS of work.  Even cases that seem cut and
> dry, from a distance, quickly prove to be hairy (just ask Robert about
> refactoring analyzers).
>
> However, I think "unproven gain" is too strong.  EG, just a few days
> ago we had a user thread asking how to use auto-suggest outside of
> Solr.  Once we commit the suggest module, this is easy/ier for that
> user, and now we have one more user testing things, finding bugs,
> maybe offering improvements, etc.  I think the gains of each
> refactoring are potentially large, but they are not immediate -- they
> accrue over time.  It's an investment.
>
> Also: I'm in no way asking/expecting other devs to sign up to do
> refactoring (your response seems to imply this).  Nobody can do such a
> thing.  We all scratch our own itches and I'm not asking you to
> scratch mine :)
>
> What I am asking is that if someone wants to scratch this itch (factor
> out XXX as a module), they are fully free to do so, as long as it
> doesn't harm Solr's/Lucene's current functions, performance, etc.  We
> don't seem to have this freedom today, and this is, I think, the core
> conflict.
>
> Grant if I'm reading your response right, you agree with that freedom
> (others are free to refactor); you're just tempering in a good dose of
> reality ("refactoring is hard"), which I agree with.

Mike thank you for this email - this is the consens we need to have!!!

+1 for this... I think this is also what the board report should
contain but I will reply to this separately.

simon
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:
>>
>>> Hey folks
>>>
>>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>>>> fast forward a year or two and I think we should have modules like
>>>> these:
>>>
>>> I think we have two camps here (10k feet view):
>>>
>>
>> I'd say 3 camps:
>>
>>> 1. wants to move towards modularization might support all the modules
>>> mike has listed below
>>> 2. wants to stick with Solr's current architecture and remain
>>> "monolithic" (not negative in this case) as much as possible
>>
>> 3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.
>>
>>
>>>
>>> I think we can meet somewhere in between and agree on certain module
>>> that should be available to lucene users as well. The ones I have in
>>> mind are
>>> primary search features like:
>>> - Faceting
>>
>> Yeah, for instance, Bobo seems to have some interesting faceting implementations that are ASL, perhaps we can combine into this new faceting module.
>>
>>> - Highlighting
>>> - Suggest
>>> - Function Query (consolidation is needed here!)
>>> - Analyzer factories
>>
>> +1.
>>
>>>
>>> things like distribution and replication should remain in solr IMO but
>>> might be moved to a more extensible API so that people can add their
>>> own implementation.
>>
>> And, of course, all the web tier stuff (response writers, inputs, etc.)
>>
>>> I am thinking about things like the ZooKeeper
>>> support that might not be a good solution for everybody where folks
>>> have already JGroups infrastructure.
>>
>> Or other similar solutions.  I wonder about using a ZeroConf implementation that can do self-discovery.
>>
>>> So I think we can work towards 2
>>> distinct goals.
>>> 1. extract common search features into modules
>>> 2. refactor solr to be more "elastic" / "distributed"  and extensible
>>> with respect to those goals.
>>
>> 3. Make it easier for Solr to be programmatically configured by decoupling the reading of schema.xml and solrconfig.xml from the code that actually contains the structures for the properties (IndexSchema and SolrConfig)
>>
>>>
>>> maybe we can get agreement on such a basis though.
>>>
>>> let me know what you think
>>
>> I think it's reasonable.  At the end of the day, it broadens the appeal of both Lucene and Solr.  Solr still exists and is not just a "shell" and at the end of the day, remains the primary choice for people who don't want to stitch everything together themselves.  All of it is easier to contribute to b/c people can focus in on the core area they know w/o having to know everything else per se.  Stuff should be better tested b/c of it as well since it will receive broader use.
>>
>> That being said, and not to be discouraging, but I see it as a ton of work.
>>
>>
>>
>>
>>>
>>> simon
>>>>
>>>>  * Faceting
>>>>
>>>>  * Highlighting
>>>>
>>>>  * Suggest (good patch is on LUCENE-2995)
>>>>
>>>>  * Schema
>>>>
>>>>  * Query impls
>>>>
>>>>  * Query parsers
>>>>
>>>>  * Analyzers (good progress here already, thanks Robert!),
>>>>    incl. factories/XML configuration (still need this)
>>>>
>>>>  * Database import (DIH)
>>>>
>>>>  * Web app
>>>>
>>>>  * Distribution/replication
>>>>
>>>>  * Doc set representations
>>>>
>>>>  * Collapse/grouping
>>>>
>>>>  * Caches
>>>>
>>>>  * Similarity/scoring impls (BM25, etc.)
>>>>
>>>>  * Codecs
>>>>
>>>>  * Joins
>>>>
>>>>  * Lucene core
>>>>
>>>> In this future, much of this code came from what is now Solr and
>>>> Lucene, but we should freely and aggressively poach from other
>>>> projects when appropriate (and license/provenance is OK).
>>>>
>>>> I keep seeing all these cool "compressed int set" projects popping
>>>> up... surely these are useful for us.  Solr poached a doc set impl
>>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>>>> etc.
>>>>
>>>> Katta's doing something sweet with distribution/replication; let's
>>>> poach & merge w/ Solr's approach.  There are various facet impls out
>>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>>>> with Solr's.
>>>>
>>>> Elastic Search has lots of cool stuff, too, under ASL2.
>>>>
>>>> All these external open-source projects are fair game for poaching and
>>>> refactoring into shared modules, along with what is now Solr and
>>>> Lucene sources.
>>>>
>>>> In this ideal future, Solr becomes the bundling and default/example
>>>> configuration of the Web App and other modules, much like how the
>>>> various Linux distros bundle different stuff together around the Linux
>>>> kernel.  And if you are an advanced app and don't need the webapp
>>>> part, you can cherry pick the huper duper modules you do need and
>>>> directly embedded into your app.
>>>>
>>>> Isn't this the future we are working towards?
>>>>
>>>> Mike
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Revolution -- Lucene and Solr User Conference
>> May 25-26 in San Francisco
>> www.lucenerevolution.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On May 7, 2011, at 6:30 AM, Michael McCandless wrote:

> 
> Grant if I'm reading your response right, you agree with that freedom
> (others are free to refactor); you're just tempering in a good dose of
> reality ("refactoring is hard"), which I agree with.

That is exactly what I am saying.  And what Hoss said.  And what Miller said.  AFAICT.

Like I said, I'd probably start w/ function queries and then spatial, but the suggest stuff is good too.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

I agree: refactoring is TONS of work.  Even cases that seem cut and
dry, from a distance, quickly prove to be hairy (just ask Robert about
refactoring analyzers).

However, I think "unproven gain" is too strong.  EG, just a few days
ago we had a user thread asking how to use auto-suggest outside of
Solr.  Once we commit the suggest module, this is easy/ier for that
user, and now we have one more user testing things, finding bugs,
maybe offering improvements, etc.  I think the gains of each
refactoring are potentially large, but they are not immediate -- they
accrue over time.  It's an investment.

Also: I'm in no way asking/expecting other devs to sign up to do
refactoring (your response seems to imply this).  Nobody can do such a
thing.  We all scratch our own itches and I'm not asking you to
scratch mine :)

What I am asking is that if someone wants to scratch this itch (factor
out XXX as a module), they are fully free to do so, as long as it
doesn't harm Solr's/Lucene's current functions, performance, etc.  We
don't seem to have this freedom today, and this is, I think, the core
conflict.

Grant if I'm reading your response right, you agree with that freedom
(others are free to refactor); you're just tempering in a good dose of
reality ("refactoring is hard"), which I agree with.

Mike

http://blog.mikemccandless.com

On Thu, May 5, 2011 at 10:25 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:
>
>> Hey folks
>>
>> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>>> fast forward a year or two and I think we should have modules like
>>> these:
>>
>> I think we have two camps here (10k feet view):
>>
>
> I'd say 3 camps:
>
>> 1. wants to move towards modularization might support all the modules
>> mike has listed below
>> 2. wants to stick with Solr's current architecture and remain
>> "monolithic" (not negative in this case) as much as possible
>
> 3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.
>
>
>>
>> I think we can meet somewhere in between and agree on certain module
>> that should be available to lucene users as well. The ones I have in
>> mind are
>> primary search features like:
>> - Faceting
>
> Yeah, for instance, Bobo seems to have some interesting faceting implementations that are ASL, perhaps we can combine into this new faceting module.
>
>> - Highlighting
>> - Suggest
>> - Function Query (consolidation is needed here!)
>> - Analyzer factories
>
> +1.
>
>>
>> things like distribution and replication should remain in solr IMO but
>> might be moved to a more extensible API so that people can add their
>> own implementation.
>
> And, of course, all the web tier stuff (response writers, inputs, etc.)
>
>> I am thinking about things like the ZooKeeper
>> support that might not be a good solution for everybody where folks
>> have already JGroups infrastructure.
>
> Or other similar solutions.  I wonder about using a ZeroConf implementation that can do self-discovery.
>
>> So I think we can work towards 2
>> distinct goals.
>> 1. extract common search features into modules
>> 2. refactor solr to be more "elastic" / "distributed"  and extensible
>> with respect to those goals.
>
> 3. Make it easier for Solr to be programmatically configured by decoupling the reading of schema.xml and solrconfig.xml from the code that actually contains the structures for the properties (IndexSchema and SolrConfig)
>
>>
>> maybe we can get agreement on such a basis though.
>>
>> let me know what you think
>
> I think it's reasonable.  At the end of the day, it broadens the appeal of both Lucene and Solr.  Solr still exists and is not just a "shell" and at the end of the day, remains the primary choice for people who don't want to stitch everything together themselves.  All of it is easier to contribute to b/c people can focus in on the core area they know w/o having to know everything else per se.  Stuff should be better tested b/c of it as well since it will receive broader use.
>
> That being said, and not to be discouraging, but I see it as a ton of work.
>
>
>
>
>>
>> simon
>>>
>>>  * Faceting
>>>
>>>  * Highlighting
>>>
>>>  * Suggest (good patch is on LUCENE-2995)
>>>
>>>  * Schema
>>>
>>>  * Query impls
>>>
>>>  * Query parsers
>>>
>>>  * Analyzers (good progress here already, thanks Robert!),
>>>    incl. factories/XML configuration (still need this)
>>>
>>>  * Database import (DIH)
>>>
>>>  * Web app
>>>
>>>  * Distribution/replication
>>>
>>>  * Doc set representations
>>>
>>>  * Collapse/grouping
>>>
>>>  * Caches
>>>
>>>  * Similarity/scoring impls (BM25, etc.)
>>>
>>>  * Codecs
>>>
>>>  * Joins
>>>
>>>  * Lucene core
>>>
>>> In this future, much of this code came from what is now Solr and
>>> Lucene, but we should freely and aggressively poach from other
>>> projects when appropriate (and license/provenance is OK).
>>>
>>> I keep seeing all these cool "compressed int set" projects popping
>>> up... surely these are useful for us.  Solr poached a doc set impl
>>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>>> etc.
>>>
>>> Katta's doing something sweet with distribution/replication; let's
>>> poach & merge w/ Solr's approach.  There are various facet impls out
>>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>>> with Solr's.
>>>
>>> Elastic Search has lots of cool stuff, too, under ASL2.
>>>
>>> All these external open-source projects are fair game for poaching and
>>> refactoring into shared modules, along with what is now Solr and
>>> Lucene sources.
>>>
>>> In this ideal future, Solr becomes the bundling and default/example
>>> configuration of the Web App and other modules, much like how the
>>> various Linux distros bundle different stuff together around the Linux
>>> kernel.  And if you are an advanced app and don't need the webapp
>>> part, you can cherry pick the huper duper modules you do need and
>>> directly embedded into your app.
>>>
>>> Isn't this the future we are working towards?
>>>
>>> Mike
>>>
>>> http://blog.mikemccandless.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Revolution -- Lucene and Solr User Conference
> May 25-26 in San Francisco
> www.lucenerevolution.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Simon Willnauer <si...@googlemail.com>.

On Sat, May 7, 2011 at 1:02 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> OK I opened:
>
>    https://issues.apache.org/jira/browse/LUCENE-3079
awesome!

+1
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Sat, May 7, 2011 at 6:46 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> I agree!  And I think you're saying the same thing as Grant.
>>
>> Ie, others are fully free to refactor stuff, as long as they don't
>> hurt Solr/Lucene (functionality, performance).
>>
>> But you are tempering that with a nice dose of reality (successfully
>> factoring out faceting will be insanely hard).
>>
>> I very much agree with that.
>>
>> And, I (and other refactor-itchers) very much want to hear the
>> specific technical skepticism/concerns on a given module: that
>> assessment is awesome and very useful.  In fact, I love your
>> enumeration of how faceting is so well integrated into Solr so much
>> that I'll go open an issue (to factor out faceting), and put your list
>> in!
>>
>> I think this will mean, in practice, that the refactoring should
>> itself proceed in baby steps.  Ie, birthing a new faceting module,
>> iterating on it, etc., and then at some point cutting Solr over to it,
>> are two events likely spread out substantially in time.
>>
>> Freedom to refactor/poach is the bread and butter of open source.
>>
>> Mike
>>
>> http://blog.mikemccandless.com
>>
>> On Fri, May 6, 2011 at 4:35 PM, Chris Hostetter
>> <ho...@fucit.org> wrote:
>>>
>>> : To me, the third camp is just saying the proof is in the pudding.  If
>>> : you want to refactor, then go for it.  Just make sure everything still
>>> : works, which of course I know people will (but part of that means
>>> : actually running Solr, IMO).  Perhaps, more importantly don't get mad
>>> : that if I have only one day a week to work on Lucene/Solr that I spend
>>> : it putting a specific feature in a specific place.  Just because
>>> : something can/should be modularized, doesn't mean that a person working
>>> : in that area must do it before they add whatever they were working on.
>>> : For instance, if and when function queries are a module, I will add to
>>> : them there and be happy to do so.  In the meantime, I will likely add to
>>> : them in Solr if that is something I happen to be interested in at that
>>> : time b/c I can certainly add a new function in a day, but I can't
>>> : refactor the whole module _and_ add my new function in a day.
>>>
>>> +1
>>>
>>> I want to get that printed on a t-shirt
>>>
>>> the corrolarry issue in my mind...
>>>
>>> I am happily in favor of code reuse and modularization in the abstract,
>>> and when it works in practice i'm plesantly delighted.
>>>
>>> But when people talk about modularization as a goal, and make a laundry
>>> list things in solr that people think should be refactored into modules
>>> (w/o showing specifics of what that module would look like) then i have a
>>> hard time buying into some of these ideas panning out in a way that:
>>>  a) is a useful module to people in and of itself
>>>  b) doesn't hamstring the evolution/performance in solr.
>>>
>>> To look at "faceting" as a concrete example, there are big the reasons
>>> faceting works so well in Solr: Solr has total control over the
>>> index, knows exactly when the index has changed to rebuild caches, has a
>>> strict schema so it can make sense of field types and
>>> pick faceting algos accordingly, has multi-phase distributed search
>>> approach to get exact counts efficiently across multiple shards, etc...
>>> (and there are still a lot of additional enhancements and improvements
>>> that can be made to take even more advantage of knowledge solr has because
>>> it "owns" the index that we no one has had time to tackle)
>>>
>>> I find it really hard to picture a way that this code could be refactored
>>> into a reusable module in such a way that it could have an API that would
>>> be easily usable outside of Solr -- and when i do get a glimmer of an
>>> inkling of what that might look like, that vision scares me because of how
>>> that API might then "hobble" Solr's ability to leverage it's total control
>>> of the underlying index to add additional performance/features.
>>>
>>> To be crystal clear: I recognize that this is *my* hangup -- I am not
>>> suggesting that "I am short sighted and have little imagination
>>> therefore this code should never be modularized."
>>>
>>> I'm trying to explain why i *personally* am hesitant and sceptical of how
>>> well modularizations of features like like this might actually work in
>>> practice, and why i'm not eager to jump in and contribute on a goal whose
>>> end result is something that i can't fully picture (and when i can picture
>>> it, i'm a little scared by what i see)
>>>
>>> That doesn't mean i'm opposed to it happening -- i would love to live in
>>> the land of candy where houses are made of ginger bread and sugar plums
>>> grow on trees, I'm just too skeptical that such a land exists (or is as
>>> great as legend describes) to go slogging along on an epic journey to try
>>> and reach it -- i'm too old for that shit.
>>>
>>> I'm certainly not going to stop anyone else fro going on that quest -- but
>>> i am entitled to voice my skepticism and concerns, just as adventursome
>>> folks are entitled to ignore me.
>>>
>>>
>>> -Hoss
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK I opened:

    https://issues.apache.org/jira/browse/LUCENE-3079

Mike

http://blog.mikemccandless.com

On Sat, May 7, 2011 at 6:46 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> I agree!  And I think you're saying the same thing as Grant.
>
> Ie, others are fully free to refactor stuff, as long as they don't
> hurt Solr/Lucene (functionality, performance).
>
> But you are tempering that with a nice dose of reality (successfully
> factoring out faceting will be insanely hard).
>
> I very much agree with that.
>
> And, I (and other refactor-itchers) very much want to hear the
> specific technical skepticism/concerns on a given module: that
> assessment is awesome and very useful.  In fact, I love your
> enumeration of how faceting is so well integrated into Solr so much
> that I'll go open an issue (to factor out faceting), and put your list
> in!
>
> I think this will mean, in practice, that the refactoring should
> itself proceed in baby steps.  Ie, birthing a new faceting module,
> iterating on it, etc., and then at some point cutting Solr over to it,
> are two events likely spread out substantially in time.
>
> Freedom to refactor/poach is the bread and butter of open source.
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Fri, May 6, 2011 at 4:35 PM, Chris Hostetter
> <ho...@fucit.org> wrote:
>>
>> : To me, the third camp is just saying the proof is in the pudding.  If
>> : you want to refactor, then go for it.  Just make sure everything still
>> : works, which of course I know people will (but part of that means
>> : actually running Solr, IMO).  Perhaps, more importantly don't get mad
>> : that if I have only one day a week to work on Lucene/Solr that I spend
>> : it putting a specific feature in a specific place.  Just because
>> : something can/should be modularized, doesn't mean that a person working
>> : in that area must do it before they add whatever they were working on.
>> : For instance, if and when function queries are a module, I will add to
>> : them there and be happy to do so.  In the meantime, I will likely add to
>> : them in Solr if that is something I happen to be interested in at that
>> : time b/c I can certainly add a new function in a day, but I can't
>> : refactor the whole module _and_ add my new function in a day.
>>
>> +1
>>
>> I want to get that printed on a t-shirt
>>
>> the corrolarry issue in my mind...
>>
>> I am happily in favor of code reuse and modularization in the abstract,
>> and when it works in practice i'm plesantly delighted.
>>
>> But when people talk about modularization as a goal, and make a laundry
>> list things in solr that people think should be refactored into modules
>> (w/o showing specifics of what that module would look like) then i have a
>> hard time buying into some of these ideas panning out in a way that:
>>  a) is a useful module to people in and of itself
>>  b) doesn't hamstring the evolution/performance in solr.
>>
>> To look at "faceting" as a concrete example, there are big the reasons
>> faceting works so well in Solr: Solr has total control over the
>> index, knows exactly when the index has changed to rebuild caches, has a
>> strict schema so it can make sense of field types and
>> pick faceting algos accordingly, has multi-phase distributed search
>> approach to get exact counts efficiently across multiple shards, etc...
>> (and there are still a lot of additional enhancements and improvements
>> that can be made to take even more advantage of knowledge solr has because
>> it "owns" the index that we no one has had time to tackle)
>>
>> I find it really hard to picture a way that this code could be refactored
>> into a reusable module in such a way that it could have an API that would
>> be easily usable outside of Solr -- and when i do get a glimmer of an
>> inkling of what that might look like, that vision scares me because of how
>> that API might then "hobble" Solr's ability to leverage it's total control
>> of the underlying index to add additional performance/features.
>>
>> To be crystal clear: I recognize that this is *my* hangup -- I am not
>> suggesting that "I am short sighted and have little imagination
>> therefore this code should never be modularized."
>>
>> I'm trying to explain why i *personally* am hesitant and sceptical of how
>> well modularizations of features like like this might actually work in
>> practice, and why i'm not eager to jump in and contribute on a goal whose
>> end result is something that i can't fully picture (and when i can picture
>> it, i'm a little scared by what i see)
>>
>> That doesn't mean i'm opposed to it happening -- i would love to live in
>> the land of candy where houses are made of ginger bread and sugar plums
>> grow on trees, I'm just too skeptical that such a land exists (or is as
>> great as legend describes) to go slogging along on an epic journey to try
>> and reach it -- i'm too old for that shit.
>>
>> I'm certainly not going to stop anyone else fro going on that quest -- but
>> i am entitled to voice my skepticism and concerns, just as adventursome
>> folks are entitled to ignore me.
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

I agree!  And I think you're saying the same thing as Grant.

Ie, others are fully free to refactor stuff, as long as they don't
hurt Solr/Lucene (functionality, performance).

But you are tempering that with a nice dose of reality (successfully
factoring out faceting will be insanely hard).

I very much agree with that.

And, I (and other refactor-itchers) very much want to hear the
specific technical skepticism/concerns on a given module: that
assessment is awesome and very useful.  In fact, I love your
enumeration of how faceting is so well integrated into Solr so much
that I'll go open an issue (to factor out faceting), and put your list
in!

I think this will mean, in practice, that the refactoring should
itself proceed in baby steps.  Ie, birthing a new faceting module,
iterating on it, etc., and then at some point cutting Solr over to it,
are two events likely spread out substantially in time.

Freedom to refactor/poach is the bread and butter of open source.

Mike

http://blog.mikemccandless.com

On Fri, May 6, 2011 at 4:35 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : To me, the third camp is just saying the proof is in the pudding.  If
> : you want to refactor, then go for it.  Just make sure everything still
> : works, which of course I know people will (but part of that means
> : actually running Solr, IMO).  Perhaps, more importantly don't get mad
> : that if I have only one day a week to work on Lucene/Solr that I spend
> : it putting a specific feature in a specific place.  Just because
> : something can/should be modularized, doesn't mean that a person working
> : in that area must do it before they add whatever they were working on.
> : For instance, if and when function queries are a module, I will add to
> : them there and be happy to do so.  In the meantime, I will likely add to
> : them in Solr if that is something I happen to be interested in at that
> : time b/c I can certainly add a new function in a day, but I can't
> : refactor the whole module _and_ add my new function in a day.
>
> +1
>
> I want to get that printed on a t-shirt
>
> the corrolarry issue in my mind...
>
> I am happily in favor of code reuse and modularization in the abstract,
> and when it works in practice i'm plesantly delighted.
>
> But when people talk about modularization as a goal, and make a laundry
> list things in solr that people think should be refactored into modules
> (w/o showing specifics of what that module would look like) then i have a
> hard time buying into some of these ideas panning out in a way that:
>  a) is a useful module to people in and of itself
>  b) doesn't hamstring the evolution/performance in solr.
>
> To look at "faceting" as a concrete example, there are big the reasons
> faceting works so well in Solr: Solr has total control over the
> index, knows exactly when the index has changed to rebuild caches, has a
> strict schema so it can make sense of field types and
> pick faceting algos accordingly, has multi-phase distributed search
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements
> that can be made to take even more advantage of knowledge solr has because
> it "owns" the index that we no one has had time to tackle)
>
> I find it really hard to picture a way that this code could be refactored
> into a reusable module in such a way that it could have an API that would
> be easily usable outside of Solr -- and when i do get a glimmer of an
> inkling of what that might look like, that vision scares me because of how
> that API might then "hobble" Solr's ability to leverage it's total control
> of the underlying index to add additional performance/features.
>
> To be crystal clear: I recognize that this is *my* hangup -- I am not
> suggesting that "I am short sighted and have little imagination
> therefore this code should never be modularized."
>
> I'm trying to explain why i *personally* am hesitant and sceptical of how
> well modularizations of features like like this might actually work in
> practice, and why i'm not eager to jump in and contribute on a goal whose
> end result is something that i can't fully picture (and when i can picture
> it, i'm a little scared by what i see)
>
> That doesn't mean i'm opposed to it happening -- i would love to live in
> the land of candy where houses are made of ginger bread and sugar plums
> grow on trees, I'm just too skeptical that such a land exists (or is as
> great as legend describes) to go slogging along on an epic journey to try
> and reach it -- i'm too old for that shit.
>
> I'm certainly not going to stop anyone else fro going on that quest -- but
> i am entitled to voice my skepticism and concerns, just as adventursome
> folks are entitled to ignore me.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

+1, +1, +1, +1, +1, +1 - This is just what I've been trying (and seemingly failing) to express.

On May 6, 2011, at 4:35 PM, Chris Hostetter wrote:

> 
> : To me, the third camp is just saying the proof is in the pudding.  If 
> : you want to refactor, then go for it.  Just make sure everything still 
> : works, which of course I know people will (but part of that means 
> : actually running Solr, IMO).  Perhaps, more importantly don't get mad 
> : that if I have only one day a week to work on Lucene/Solr that I spend 
> : it putting a specific feature in a specific place.  Just because 
> : something can/should be modularized, doesn't mean that a person working 
> : in that area must do it before they add whatever they were working on.  
> : For instance, if and when function queries are a module, I will add to 
> : them there and be happy to do so.  In the meantime, I will likely add to 
> : them in Solr if that is something I happen to be interested in at that 
> : time b/c I can certainly add a new function in a day, but I can't 
> : refactor the whole module _and_ add my new function in a day.
> 
> +1
> 
> I want to get that printed on a t-shirt
> 
> the corrolarry issue in my mind...
> 
> I am happily in favor of code reuse and modularization in the abstract, 
> and when it works in practice i'm plesantly delighted.
> 
> But when people talk about modularization as a goal, and make a laundry 
> list things in solr that people think should be refactored into modules 
> (w/o showing specifics of what that module would look like) then i have a 
> hard time buying into some of these ideas panning out in a way that:
>  a) is a useful module to people in and of itself
>  b) doesn't hamstring the evolution/performance in solr.
> 
> To look at "faceting" as a concrete example, there are big the reasons 
> faceting works so well in Solr: Solr has total control over the 
> index, knows exactly when the index has changed to rebuild caches, has a 
> strict schema so it can make sense of field types and 
> pick faceting algos accordingly, has multi-phase distributed search 
> approach to get exact counts efficiently across multiple shards, etc...
> (and there are still a lot of additional enhancements and improvements 
> that can be made to take even more advantage of knowledge solr has because 
> it "owns" the index that we no one has had time to tackle)
> 
> I find it really hard to picture a way that this code could be refactored 
> into a reusable module in such a way that it could have an API that would 
> be easily usable outside of Solr -- and when i do get a glimmer of an 
> inkling of what that might look like, that vision scares me because of how 
> that API might then "hobble" Solr's ability to leverage it's total control 
> of the underlying index to add additional performance/features.
> 
> To be crystal clear: I recognize that this is *my* hangup -- I am not 
> suggesting that "I am short sighted and have little imagination 
> therefore this code should never be modularized."
> 
> I'm trying to explain why i *personally* am hesitant and sceptical of how 
> well modularizations of features like like this might actually work in 
> practice, and why i'm not eager to jump in and contribute on a goal whose 
> end result is something that i can't fully picture (and when i can picture 
> it, i'm a little scared by what i see)
> 
> That doesn't mean i'm opposed to it happening -- i would love to live in 
> the land of candy where houses are made of ginger bread and sugar plums 
> grow on trees, I'm just too skeptical that such a land exists (or is as 
> great as legend describes) to go slogging along on an epic journey to try 
> and reach it -- i'm too old for that shit.
> 
> I'm certainly not going to stop anyone else fro going on that quest -- but 
> i am entitled to voice my skepticism and concerns, just as adventursome 
> folks are entitled to ignore me.
> 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Chris Hostetter <ho...@fucit.org>.

: To me, the third camp is just saying the proof is in the pudding.  If 
: you want to refactor, then go for it.  Just make sure everything still 
: works, which of course I know people will (but part of that means 
: actually running Solr, IMO).  Perhaps, more importantly don't get mad 
: that if I have only one day a week to work on Lucene/Solr that I spend 
: it putting a specific feature in a specific place.  Just because 
: something can/should be modularized, doesn't mean that a person working 
: in that area must do it before they add whatever they were working on.  
: For instance, if and when function queries are a module, I will add to 
: them there and be happy to do so.  In the meantime, I will likely add to 
: them in Solr if that is something I happen to be interested in at that 
: time b/c I can certainly add a new function in a day, but I can't 
: refactor the whole module _and_ add my new function in a day.

+1

I want to get that printed on a t-shirt

the corrolarry issue in my mind...

I am happily in favor of code reuse and modularization in the abstract, 
and when it works in practice i'm plesantly delighted.

But when people talk about modularization as a goal, and make a laundry 
list things in solr that people think should be refactored into modules 
(w/o showing specifics of what that module would look like) then i have a 
hard time buying into some of these ideas panning out in a way that:
  a) is a useful module to people in and of itself
  b) doesn't hamstring the evolution/performance in solr.

To look at "faceting" as a concrete example, there are big the reasons 
faceting works so well in Solr: Solr has total control over the 
index, knows exactly when the index has changed to rebuild caches, has a 
strict schema so it can make sense of field types and 
pick faceting algos accordingly, has multi-phase distributed search 
approach to get exact counts efficiently across multiple shards, etc...
(and there are still a lot of additional enhancements and improvements 
that can be made to take even more advantage of knowledge solr has because 
it "owns" the index that we no one has had time to tackle)

I find it really hard to picture a way that this code could be refactored 
into a reusable module in such a way that it could have an API that would 
be easily usable outside of Solr -- and when i do get a glimmer of an 
inkling of what that might look like, that vision scares me because of how 
that API might then "hobble" Solr's ability to leverage it's total control 
of the underlying index to add additional performance/features.

To be crystal clear: I recognize that this is *my* hangup -- I am not 
suggesting that "I am short sighted and have little imagination 
therefore this code should never be modularized."

I'm trying to explain why i *personally* am hesitant and sceptical of how 
well modularizations of features like like this might actually work in 
practice, and why i'm not eager to jump in and contribute on a goal whose 
end result is something that i can't fully picture (and when i can picture 
it, i'm a little scared by what i see)

That doesn't mean i'm opposed to it happening -- i would love to live in 
the land of candy where houses are made of ginger bread and sugar plums 
grow on trees, I'm just too skeptical that such a land exists (or is as 
great as legend describes) to go slogging along on an epic journey to try 
and reach it -- i'm too old for that shit.

I'm certainly not going to stop anyone else fro going on that quest -- but 
i am entitled to voice my skepticism and concerns, just as adventursome 
folks are entitled to ignore me.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On May 5, 2011, at 11:03 AM, Simon Willnauer wrote:

> On Thu, May 5, 2011 at 4:41 PM, Mark Miller <ma...@gmail.com> wrote:
>> 
>> On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:
>> 
>>> 3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.
>> 
>> +1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case.
> 
> so this is one thing I really don't understand. you say you are in the
> 3rd camp. Guys in that camp have not much time to do the work but
> still are not willing to sign up for what we want to modularize.

I don't follow this leap.  (BTW, I'm actually mostly in camp #1 and a little in camp #3, I just want to make sure, based on what I've read that all sides are represented.  I like Mike's approach, but I also know it is a ton of work and details matter.)  

> Nobody asks you to do the work I only ask you to say ok I think this
> is good and NOT sitting in the way blocking others. This is really
> what the 3rd camp is about to me but maybe I miss-understand something
> here.
> 
> Again you are saying you are not in camp 1 but you want to still
> fiddle around with long discussion before we get anything done (and
> eventually be against it - nothing personal)

I don't think that is what Mark is saying nor is it what camp #3 is saying.  And I don't think we are fiddling w/ long discussions (it's only been a couple of days.)  This is hugely important.  We need consensus to move forward.

> because you don't have
> enough time to fit stiff in your schedule. This makes no sense to me.
> That case by case stuff makes me sick. Lets put some goals out and say
> ok this makes sense in a module this doesn't and let folks work on it.

To me, the third camp is just saying the proof is in the pudding.  If you want to refactor, then go for it.  Just make sure everything still works, which of course I know people will (but part of that means actually running Solr, IMO).  Perhaps, more importantly don't get mad that if I have only one day a week to work on Lucene/Solr that I spend it putting a specific feature in a specific place.  Just because something can/should be modularized, doesn't mean that a person working in that area must do it before they add whatever they were working on.  For instance, if and when function queries are a module, I will add to them there and be happy to do so.  In the meantime, I will likely add to them in Solr if that is something I happen to be interested in at that time b/c I can certainly add a new function in a day, but I can't refactor the whole module _and_ add my new function in a day.

In the end, I think we are in agreement (at least you and me), actually.  To me, the best place to start on this is:
1. Function queries
2. Spatial
3. Faceting

(In that order)

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Simon Willnauer <si...@googlemail.com>.

On Thu, May 5, 2011 at 4:41 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:
>
>> 3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.
>
> +1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case.

so this is one thing I really don't understand. you say you are in the
3rd camp. Guys in that camp have not much time to do the work but
still are not willing to sign up for what we want to modularize.
Nobody asks you to do the work I only ask you to say ok I think this
is good and NOT sitting in the way blocking others. This is really
what the 3rd camp is about to me but maybe I miss-understand something
here.

Again you are saying you are not in camp 1 but you want to still
fiddle around with long discussion before we get anything done (and
eventually be against it - nothing personal) because you don't have
enough time to fit stiff in your schedule. This makes no sense to me.
That case by case stuff makes me sick. Lets put some goals out and say
ok this makes sense in a module this doesn't and let folks work on it.
We need some agreement here and I think we have written enough emails
to make our points. I think we should agree on a set of things and
once we are there we can talk again. Dreams vs. Babysteps!

Lets settle on something now, today or next week and stop this wast of
time. I am happy with an agreement that we don't factor anything out.
all remains in solr but we need to move here! After all these
discussion I don't have any motivation to work on it anyway. I think I
need to step back for a while along those lines!

simon
>
> Except I'm more open to IRC discussion :)
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 5, 2011, at 10:25 AM, Grant Ingersoll wrote:

> 3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.

+1. From what I have gathered, Grant and I come down pretty much on the same page on most of this stuff. Yeah, that mean's I'm reevaluating my position :) but seems to be the case.

Except I'm more open to IRC discussion :)

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On May 5, 2011, at 4:15 AM, Simon Willnauer wrote:

> Hey folks
> 
> On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Isn't our end goal here a bunch of well factored search modules?  Ie,
>> fast forward a year or two and I think we should have modules like
>> these:
> 
> I think we have two camps here (10k feet view):
> 

I'd say 3 camps:

> 1. wants to move towards modularization might support all the modules
> mike has listed below
> 2. wants to stick with Solr's current architecture and remain
> "monolithic" (not negative in this case) as much as possible

3.  Those who think most should be modularized, but realize it's a ton of work for an unproven gain (although most admit it is a highly likely gain) and should be handled on a case-by-case basis as people do the work.   I don't have anything against modularization, I just know, given my schedule, I won't be able to block off weeks of time to do it.  I'm happy to review where/when I can.


> 
> I think we can meet somewhere in between and agree on certain module
> that should be available to lucene users as well. The ones I have in
> mind are
> primary search features like:
> - Faceting

Yeah, for instance, Bobo seems to have some interesting faceting implementations that are ASL, perhaps we can combine into this new faceting module.

> - Highlighting
> - Suggest
> - Function Query (consolidation is needed here!)
> - Analyzer factories

+1.

> 
> things like distribution and replication should remain in solr IMO but
> might be moved to a more extensible API so that people can add their
> own implementation.

And, of course, all the web tier stuff (response writers, inputs, etc.)

> I am thinking about things like the ZooKeeper
> support that might not be a good solution for everybody where folks
> have already JGroups infrastructure.

Or other similar solutions.  I wonder about using a ZeroConf implementation that can do self-discovery.

> So I think we can work towards 2
> distinct goals.
> 1. extract common search features into modules
> 2. refactor solr to be more "elastic" / "distributed"  and extensible
> with respect to those goals.

3. Make it easier for Solr to be programmatically configured by decoupling the reading of schema.xml and solrconfig.xml from the code that actually contains the structures for the properties (IndexSchema and SolrConfig)

> 
> maybe we can get agreement on such a basis though.
> 
> let me know what you think

I think it's reasonable.  At the end of the day, it broadens the appeal of both Lucene and Solr.  Solr still exists and is not just a "shell" and at the end of the day, remains the primary choice for people who don't want to stitch everything together themselves.  All of it is easier to contribute to b/c people can focus in on the core area they know w/o having to know everything else per se.  Stuff should be better tested b/c of it as well since it will receive broader use.

That being said, and not to be discouraging, but I see it as a ton of work.




> 
> simon
>> 
>>  * Faceting
>> 
>>  * Highlighting
>> 
>>  * Suggest (good patch is on LUCENE-2995)
>> 
>>  * Schema
>> 
>>  * Query impls
>> 
>>  * Query parsers
>> 
>>  * Analyzers (good progress here already, thanks Robert!),
>>    incl. factories/XML configuration (still need this)
>> 
>>  * Database import (DIH)
>> 
>>  * Web app
>> 
>>  * Distribution/replication
>> 
>>  * Doc set representations
>> 
>>  * Collapse/grouping
>> 
>>  * Caches
>> 
>>  * Similarity/scoring impls (BM25, etc.)
>> 
>>  * Codecs
>> 
>>  * Joins
>> 
>>  * Lucene core
>> 
>> In this future, much of this code came from what is now Solr and
>> Lucene, but we should freely and aggressively poach from other
>> projects when appropriate (and license/provenance is OK).
>> 
>> I keep seeing all these cool "compressed int set" projects popping
>> up... surely these are useful for us.  Solr poached a doc set impl
>> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
>> etc.
>> 
>> Katta's doing something sweet with distribution/replication; let's
>> poach & merge w/ Solr's approach.  There are various facet impls out
>> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
>> with Solr's.
>> 
>> Elastic Search has lots of cool stuff, too, under ASL2.
>> 
>> All these external open-source projects are fair game for poaching and
>> refactoring into shared modules, along with what is now Solr and
>> Lucene sources.
>> 
>> In this ideal future, Solr becomes the bundling and default/example
>> configuration of the Web App and other modules, much like how the
>> various Linux distros bundle different stuff together around the Linux
>> kernel.  And if you are an advanced app and don't need the webapp
>> part, you can cherry pick the huper duper modules you do need and
>> directly embedded into your app.
>> 
>> Isn't this the future we are working towards?
>> 
>> Mike
>> 
>> http://blog.mikemccandless.com
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Simon Willnauer <si...@googlemail.com>.

Hey folks

On Tue, May 3, 2011 at 6:49 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Isn't our end goal here a bunch of well factored search modules?  Ie,
> fast forward a year or two and I think we should have modules like
> these:

I think we have two camps here (10k feet view):

1. wants to move towards modularization might support all the modules
mike has listed below
2. wants to stick with Solr's current architecture and remain
"monolithic" (not negative in this case) as much as possible

I think we can meet somewhere in between and agree on certain module
that should be available to lucene users as well. The ones I have in
mind are
primary search features like:
 - Faceting
- Highlighting
- Suggest
- Function Query (consolidation is needed here!)
- Analyzer factories

things like distribution and replication should remain in solr IMO but
might be moved to a more extensible API so that people can add their
own implementation. I am thinking about things like the ZooKeeper
support that might not be a good solution for everybody where folks
have already JGroups infrastructure. So I think we can work towards 2
distinct goals.
1. extract common search features into modules
2. refactor solr to be more "elastic" / "distributed"  and extensible
with respect to those goals.

maybe we can get agreement on such a basis though.

let me know what you think

simon
>
>  * Faceting
>
>  * Highlighting
>
>  * Suggest (good patch is on LUCENE-2995)
>
>  * Schema
>
>  * Query impls
>
>  * Query parsers
>
>  * Analyzers (good progress here already, thanks Robert!),
>    incl. factories/XML configuration (still need this)
>
>  * Database import (DIH)
>
>  * Web app
>
>  * Distribution/replication
>
>  * Doc set representations
>
>  * Collapse/grouping
>
>  * Caches
>
>  * Similarity/scoring impls (BM25, etc.)
>
>  * Codecs
>
>  * Joins
>
>  * Lucene core
>
> In this future, much of this code came from what is now Solr and
> Lucene, but we should freely and aggressively poach from other
> projects when appropriate (and license/provenance is OK).
>
> I keep seeing all these cool "compressed int set" projects popping
> up... surely these are useful for us.  Solr poached a doc set impl
> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
> etc.
>
> Katta's doing something sweet with distribution/replication; let's
> poach & merge w/ Solr's approach.  There are various facet impls out
> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
> with Solr's.
>
> Elastic Search has lots of cool stuff, too, under ASL2.
>
> All these external open-source projects are fair game for poaching and
> refactoring into shared modules, along with what is now Solr and
> Lucene sources.
>
> In this ideal future, Solr becomes the bundling and default/example
> configuration of the Web App and other modules, much like how the
> various Linux distros bundle different stuff together around the Linux
> kernel.  And if you are an advanced app and don't need the webapp
> part, you can cherry pick the huper duper modules you do need and
> directly embedded into your app.
>
> Isn't this the future we are working towards?
>
> Mike
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Jason Rutherglen <ja...@gmail.com>.

+1 to Mike's proposal here.  Each of these could easily be
patches/issues.  The top ones would probably be the basics, eg,
faceting and schemas.

As the easiest short term solution for allowing other systems to use
Solr or it's features, it would be great if a 'committer' responded to
SOLR-1431.  Eg, it's assigned to someone and they should respond.  The
issue should probably be unassigned or assigned to someone else.

Lucene is a great project that many people rely on.  Refactoring Solr
will help the project by allowing more people to do more things with
Lucene.  That's an overall 'good' thing for everyone.  Also have we
lost the ability to execute distributed queries in Lucene?

Taking a step back I'd ask some of the owners of the projects
mentioned why they do not simply submit patches directly to the Apache
Lucene project as opposed to starting their own external projects?

On Tue, May 3, 2011 at 9:49 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Isn't our end goal here a bunch of well factored search modules?  Ie,
> fast forward a year or two and I think we should have modules like
> these:
>
>  * Faceting
>
>  * Highlighting
>
>  * Suggest (good patch is on LUCENE-2995)
>
>  * Schema
>
>  * Query impls
>
>  * Query parsers
>
>  * Analyzers (good progress here already, thanks Robert!),
>    incl. factories/XML configuration (still need this)
>
>  * Database import (DIH)
>
>  * Web app
>
>  * Distribution/replication
>
>  * Doc set representations
>
>  * Collapse/grouping
>
>  * Caches
>
>  * Similarity/scoring impls (BM25, etc.)
>
>  * Codecs
>
>  * Joins
>
>  * Lucene core
>
> In this future, much of this code came from what is now Solr and
> Lucene, but we should freely and aggressively poach from other
> projects when appropriate (and license/provenance is OK).
>
> I keep seeing all these cool "compressed int set" projects popping
> up... surely these are useful for us.  Solr poached a doc set impl
> from Nutch; probably there's other stuff to poach from Nutch, Mahout,
> etc.
>
> Katta's doing something sweet with distribution/replication; let's
> poach & merge w/ Solr's approach.  There are various facet impls out
> there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
> with Solr's.
>
> Elastic Search has lots of cool stuff, too, under ASL2.
>
> All these external open-source projects are fair game for poaching and
> refactoring into shared modules, along with what is now Solr and
> Lucene sources.
>
> In this ideal future, Solr becomes the bundling and default/example
> configuration of the Web App and other modules, much like how the
> various Linux distros bundle different stuff together around the Linux
> kernel.  And if you are an advanced app and don't need the webapp
> part, you can cherry pick the huper duper modules you do need and
> directly embedded into your app.
>
> Isn't this the future we are working towards?
>
> Mike
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Simon Willnauer <si...@googlemail.com>.

On Wed, May 4, 2011 at 3:49 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On May 4, 2011, at 9:42 AM, Uwe Schindler wrote:
>
>> Solr has no performance testing framework, see the issue from today (SOLR-2493).
>
> Come to Berlin Buzzwords!
I think I will come :)
simon
>
> (I know you already are :) )
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 4, 2011, at 9:42 AM, Uwe Schindler wrote:

> Solr has no performance testing framework, see the issue from today (SOLR-2493).

Come to Berlin Buzzwords!

(I know you already are :) )

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: modularization discussion

Posted by Uwe Schindler <uw...@thetaphi.de>.

> From: Robert Muir [mailto:rcmuir@gmail.com]
> Sent: Wednesday, May 04, 2011 3:30 PM
> To: dev@lucene.apache.org
> Subject: Re: modularization discussion
> 
> On Wed, May 4, 2011 at 9:11 AM, Mark Miller <ma...@gmail.com>
> wrote:
> > Side note (plug): I have been playing with the benchmark module (who did
> that module? I had missed it), and I've got some cool stuff to show at Berlin
> Buzzwords this year for my solr performance talk!
> >
> 
> we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845
> 
> We should feel free to make this depend upon solr now (I know we probably
> have to change some things about the build for that to totally work, but thats
> the idea).

Hihi,

Solr has no performance testing framework, see the issue from today (SOLR-2493).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Robert Muir <rc...@gmail.com>.

On Wed, May 4, 2011 at 9:11 AM, Mark Miller <ma...@gmail.com> wrote:
> Side note (plug): I have been playing with the benchmark module (who did that module? I had missed it), and I've got some cool stuff to show at Berlin Buzzwords this year for my solr performance talk!
>

we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845

We should feel free to make this depend upon solr now (I know we
probably have to change some things about the build for that to
totally work, but thats the idea).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 4, 2011, at 8:25 AM, Michael McCandless wrote:

> Mark,
> 
> Can you give some more details on your disagreement here...?
> 
> Are there certain modules from my list that you don't think should be
> modules?  The timeframe (1-2 years) is too optimistic/aggressive?  Or
> you disagree that we should poach from outside projects too...?

I don't necessarily disagree with your goals - I'm just saying those are not my goals. 

I think just like minix vs linux (should I mention hurd for stallman?), there are tradeoffs when trying to tackle some of these things modules style vs monolithic style. Yes, an OS is not Lucene/Solr, I'm going for more connotation than anything here.

Now, if some people came in and just did things module style in a way that matches the monolithic style (quality, feature wise), and they do that module after module, that is one thing. But I think that is indeed a daunting task, and I think there are a lot of other things to focus on. The end result is not even any guarantee - we seem just as likely to end up with a mess of modules with all kinds of crazy interdependencies. It's really easy to say, yeah, everything should be a module, sounds great, but there are large practical issues there. And from an open source project perspective, it's all even harder to plan. That's why I'm so about case by case.

I think poaching compatible license open source code is always okay.

> 
> Or, more generally, you don't think Solr benefits from being opened up
> / modularized?

I think there would be benefits for many types of modules. And perhaps some downsides for some depending on the developers involved and how long they stay involved, and some of the interdependency issues that seem likely. Overall, I'm not terribly concerned about modules - they are not on my short term priority list (Analyzers would be for sure though, thanks Robert!).

On the one hand, you might think, well other Lucene users could take advantage of more of this stuff - and I see that as something kind of nice myself - but they already can use this stuff too - use Solr. So it's just not on the tip of my priority poll. I happily accept others are more concerned about it.

To wrap up, like I've said a million times, I'm not against modules. I also just don't share that same long term vision right now I guess.

Side note (plug): I have been playing with the benchmark module (who did that module? I had missed it), and I've got some cool stuff to show at Berlin Buzzwords this year for my solr performance talk!

> 
> Mike
> 
> http://blog.mikemccandless.com
> 
> On Tue, May 3, 2011 at 1:11 PM, Mark Miller <ma...@gmail.com> wrote:
>> 
>> On May 3, 2011, at 12:49 PM, Michael McCandless wrote:
>> 
>>> Isn't this the future we are working towards?
>> 
>> No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time.
>> 
>> When I voted to merge, no, I certainly was not thinking, I hope in a year or two we have taken everything from Solr and made it a module. I did it for a few specific things to start - analyzers for sure, perhaps some other things as people did something that made sense. I did it so we could share some code more easily - not all code.
>> 
>> Others did it for their own reasons I assume.
>> 
>> But no - I'm not sure I have ever fully subscribed to what you are saying.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> Lucene/Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

Mark,

Can you give some more details on your disagreement here...?

Are there certain modules from my list that you don't think should be
modules?  The timeframe (1-2 years) is too optimistic/aggressive?  Or
you disagree that we should poach from outside projects too...?

Or, more generally, you don't think Solr benefits from being opened up
/ modularized?

Mike

http://blog.mikemccandless.com

On Tue, May 3, 2011 at 1:11 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On May 3, 2011, at 12:49 PM, Michael McCandless wrote:
>
>> Isn't this the future we are working towards?
>
> No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time.
>
> When I voted to merge, no, I certainly was not thinking, I hope in a year or two we have taken everything from Solr and made it a module. I did it for a few specific things to start - analyzers for sure, perhaps some other things as people did something that made sense. I did it so we could share some code more easily - not all code.
>
> Others did it for their own reasons I assume.
>
> But no - I'm not sure I have ever fully subscribed to what you are saying.
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Ryan McKinley <ry...@gmail.com>.

On Tue, May 3, 2011 at 1:11 PM, Mark Miller <ma...@gmail.com> wrote:
>
> On May 3, 2011, at 12:49 PM, Michael McCandless wrote:
>
>> Isn't this the future we are working towards?
>
> No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time.
>

I hope the outcome of this discussion is a shared sense of the
relationship between lucene, solr, and modules -- we need some general
guidelines so that every time this comes up we don't have to have the
same discussion over and over.

Mike I agree with the general vision -- the details on how it would
actually work suggest that we may have to fast forward more then a
"year or two" for most of these things -- but who knows.

ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 3, 2011, at 1:29 PM, Shai Erera wrote:

> I don't like that approach. Two years from now, if indeed your vision becomes the reality (obviously, not everyone think like you), what would o.a.solr mean? Who will remember that 'suggest' (just picking an example) came from Solr? Who'd care?
> 
> Why, when I will integrate several modules together, will I need to see o.a.lucene on some, and o.a.solr on others, when both come from the same distro (even same tar.gz file, e.g. modules)?
> 
> What makes sense, at least to me, is that either we call everything o.a.lucene and solr becomes o.a.lucene.solr (I know I've probably pissed off some people with that, sorry), or we come up w/ a new namespace (proposed by Grant I think) o.a.lusolr. If we go with the second, then we'll have 3 namespaces:
> * o.a.lucene for core Lucene stuff (e.g. Lucene core, benchmark?)
> * o.a.solr for pure/core Solr stuff
> * o.a.lusolr for shared modules.

Honestly, I could go for any of those. I can't bring myself to get caught up caring long term what the package names are. You can't even make rules about that - they won't and shouldn't stand over time.

> 
> Picking a good package name is important. And deciding to call everything that came from Solr o.a.solr, just to not offend someone, is not the right way to do things, at least IMO.

Yeah, its just not a sustainable idea for an open source project anyway.

> 
> Mike, I do share with you the vision you outline, and I believe many of us do. It will become a reality if we factor out modules from Solr and Lucene under /modules. It can also become a reality if someone simply contributes under /modules alternative packages for e.g. faceting, suggest, spellcheck etc. If those are good packages, I doubt "Solr" would be reluctant to adopt them.

> 
> Either way, it's the community that will dictate the future of itself, and not individuals. Perhaps we should stop discussing what can possibly happen, and start doing things. Actions get more results than endless threads. This have been stated on this thread numerous times -- if a contribution is good, well coded, designed, thought of, it will go in. Whether it's a refactoring of something, or a completely new code. I doubt there are people on this community that can stand in the way of it.

This is really the crux of it. IMO, people should be much less concerned with how they perceive others, and more concerned with just doing things. The Apache rules are set up to deal with this type of thing. Those rules can get tricky, and nobody likes to fall back on them - but when you have strong disagreement, that is what they are there for. Not everyone on a project has to agree - nor do they have to have "pure open source" motives. That's just normal and expected. We are a very varied group. The more differences the better IMHO.

Just as a reminder - a couple things I see repeatedly at Apache:

"community over code"
"merit does not expire"

Other than that, the doers do, occasionally we vote, and in general things move along.

> 
> Shai

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Shai Erera <se...@gmail.com>.

On the namespace, since Yonik seems concerned about it, and others
aren't (I think?), why don't we leave everything factored out of Solr
under the under org.apache.solr namespace?
Anyone object to that approach?

I don't like that approach. Two years from now, if indeed your vision
becomes the reality (obviously, not everyone think like you), what would
o.a.solr mean? Who will remember that 'suggest' (just picking an example)
came from Solr? Who'd care?

Why, when I will integrate several modules together, will I need to see
o.a.lucene on some, and o.a.solr on others, when both come from the same
distro (even same tar.gz file, e.g. modules)?

What makes sense, at least to me, is that either we call everything
o.a.lucene and solr becomes o.a.lucene.solr (I know I've probably pissed off
some people with that, sorry), or we come up w/ a new namespace (proposed by
Grant I think) o.a.lusolr. If we go with the second, then we'll have 3
namespaces:
* o.a.lucene for core Lucene stuff (e.g. Lucene core, benchmark?)
* o.a.solr for pure/core Solr stuff
* o.a.lusolr for shared modules.

Picking a good package name is important. And deciding to call everything
that came from Solr o.a.solr, just to not offend someone, is not the right
way to do things, at least IMO.

Mike, I do share with you the vision you outline, and I believe many of us
do. It will become a reality if we factor out modules from Solr and Lucene
under /modules. It can also become a reality if someone simply contributes
under /modules alternative packages for e.g. faceting, suggest, spellcheck
etc. If those are good packages, I doubt "Solr" would be reluctant to adopt
them.

Either way, it's the community that will dictate the future of itself, and
not individuals. Perhaps we should stop discussing what can possibly happen,
and start doing things. Actions get more results than endless threads. This
have been stated on this thread numerous times -- if a contribution is good,
well coded, designed, thought of, it will go in. Whether it's a refactoring
of something, or a completely new code. I doubt there are people on this
community that can stand in the way of it.

Shai

On Tue, May 3, 2011 at 8:11 PM, Mark Miller <ma...@gmail.com> wrote:

>
> On May 3, 2011, at 12:49 PM, Michael McCandless wrote:
>
> > Isn't this the future we are working towards?
>
> No, not really. Others perhaps, but not me. I'm on board with some modules.
> I do think there are tradeoffs when considering them and considering Lucene
> and Solr. I'm happy to take everything one issue at a time.
>
> When I voted to merge, no, I certainly was not thinking, I hope in a year
> or two we have taken everything from Solr and made it a module. I did it for
> a few specific things to start - analyzers for sure, perhaps some other
> things as people did something that made sense. I did it so we could share
> some code more easily - not all code.
>
> Others did it for their own reasons I assume.
>
> But no - I'm not sure I have ever fully subscribed to what you are saying.
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 3, 2011, at 12:49 PM, Michael McCandless wrote:

> Isn't this the future we are working towards?

No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time.

When I voted to merge, no, I certainly was not thinking, I hope in a year or two we have taken everything from Solr and made it a module. I did it for a few specific things to start - analyzers for sure, perhaps some other things as people did something that made sense. I did it so we could share some code more easily - not all code.

Others did it for their own reasons I assume.

But no - I'm not sure I have ever fully subscribed to what you are saying.

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

Isn't our end goal here a bunch of well factored search modules?  Ie,
fast forward a year or two and I think we should have modules like
these:

  * Faceting

  * Highlighting

  * Suggest (good patch is on LUCENE-2995)

  * Schema

  * Query impls

  * Query parsers

  * Analyzers (good progress here already, thanks Robert!),
    incl. factories/XML configuration (still need this)

  * Database import (DIH)

  * Web app

  * Distribution/replication

  * Doc set representations

  * Collapse/grouping

  * Caches

  * Similarity/scoring impls (BM25, etc.)

  * Codecs

  * Joins

  * Lucene core

In this future, much of this code came from what is now Solr and
Lucene, but we should freely and aggressively poach from other
projects when appropriate (and license/provenance is OK).

I keep seeing all these cool "compressed int set" projects popping
up... surely these are useful for us.  Solr poached a doc set impl
from Nutch; probably there's other stuff to poach from Nutch, Mahout,
etc.

Katta's doing something sweet with distribution/replication; let's
poach & merge w/ Solr's approach.  There are various facet impls out
there (Bobo browse/Zoie; Toke's; Elastic Search); let's poach & merge
with Solr's.

Elastic Search has lots of cool stuff, too, under ASL2.

All these external open-source projects are fair game for poaching and
refactoring into shared modules, along with what is now Solr and
Lucene sources.

In this ideal future, Solr becomes the bundling and default/example
configuration of the Web App and other modules, much like how the
various Linux distros bundle different stuff together around the Linux
kernel.  And if you are an advanced app and don't need the webapp
part, you can cherry pick the huper duper modules you do need and
directly embedded into your app.

Isn't this the future we are working towards?

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Apr 27, 2011 at 11:49 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> Why impose namespace restrictions based on where code was originally
>>> committed?  I think the namespace of refactored code should reflect
>>> the nature of the code, not its original origins?
>>
>> And if it's a very core part of solr that we've tended to hang a lot of
>> new features on, etc, then the nature of that code should still
>> hopefully be "solrish".
>
> I'm confused... aren't they all "solrish"?  Like, of the refactorings
> on the table, which ones are not solrish?

The benchmarking stuff definitely originated in lucene-land, there was
much more lucene analysis than solr analysis in that module consolidation,
and non-sandboxish stuff in lucene-contrib that may be refactored/moved
to modules.

> Is the real issue here that you want Solr's name to live on no matter
> how this code is refactored in the future?
>
>>> For example, when I refactored UnInvertedField, it split nicely into a
>>> Solr piece and a core Lucene piece, and so I gave the core Lucene
>>> piece then org.apache.lucene.index namespace.
>>
>> That's because it was factored directly into Lucene-core, not into a module.
>
> OK.
>
>>> I think leaving refactored code in the solr namespace sends the wrong
>>> message (ie, that this module "depends" on Solr somehow).  The lucene
>>> namespace makes it clear that it only depends on Lucene.
>>
>> But that won't be true... it's likely that many modules will depend on other
>> modules.
>
> Sure but that's fine?  Each layer can depend on other stuff in its
> layer, or in stuff in the lower (more "core") layers.  Solr depends on
> Solr stuff and modules and Lucene core.  Modules depend on other
> modules an Lucene core.

But my point was the namespace doesn't tell you what the dependencies
of the modules are.  "lucene" wouldn't mean that it depends on lucene-core
only... (and depending what it is, may not depend on lucene-core at all)
and "solr" wouldn't mean that it depends on solr-core.

>> But as I said... it seems only fair to meet half way and use the solr namespace
>> for some modules and the lucene namespace for others.
>
> Actually I think a whole new namespace (Steven's suggestion) is a
> great idea?  Would that work?  (Else we'll be arguing on every module
> refactoring what namespace it should take...).
>
> Or, I would also be fine with naming all modules factored out of solr
> under the solr namespace, as long as we make it clear that you can use
> them w/o the rest of Solr.

Of course!  That's the whole point of refactoring a module out of some
solr functionality.
Actual dependencies (i.e. which modules depend on which modules) would
be TBD of course.

> Are there other (technical) objections to ongoing refactoring besides
> this namespace problem?

I don't think so in general - as I stated before, w.r.t. LUCENE-2883,
later discussions
led me to believe there was very little disagreement left (and I
actually thought
some of us had come to an agreement).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Why impose namespace restrictions based on where code was originally
>> committed?  I think the namespace of refactored code should reflect
>> the nature of the code, not its original origins?
>
> And if it's a very core part of solr that we've tended to hang a lot of
> new features on, etc, then the nature of that code should still
> hopefully be "solrish".

I'm confused... aren't they all "solrish"?  Like, of the refactorings
on the table, which ones are not solrish?

Is the real issue here that you want Solr's name to live on no matter
how this code is refactored in the future?

>> For example, when I refactored UnInvertedField, it split nicely into a
>> Solr piece and a core Lucene piece, and so I gave the core Lucene
>> piece then org.apache.lucene.index namespace.
>
> That's because it was factored directly into Lucene-core, not into a module.

OK.

>> I think leaving refactored code in the solr namespace sends the wrong
>> message (ie, that this module "depends" on Solr somehow).  The lucene
>> namespace makes it clear that it only depends on Lucene.
>
> But that won't be true... it's likely that many modules will depend on other
> modules.

Sure but that's fine?  Each layer can depend on other stuff in its
layer, or in stuff in the lower (more "core") layers.  Solr depends on
Solr stuff and modules and Lucene core.  Modules depend on other
modules an Lucene core.

> But as I said... it seems only fair to meet half way and use the solr namespace
> for some modules and the lucene namespace for others.

Actually I think a whole new namespace (Steven's suggestion) is a
great idea?  Would that work?  (Else we'll be arguing on every module
refactoring what namespace it should take...).

Or, I would also be fine with naming all modules factored out of solr
under the solr namespace, as long as we make it clear that you can use
them w/o the rest of Solr.

Are there other (technical) objections to ongoing refactoring besides
this namespace problem?

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: modularization discussion

Posted by Steven A Rowe <sa...@syr.edu>.

On 4/27/2011 at 9:25 AM, Yonik wrote:
> it seems only fair to meet half way and use the solr namespace
> for some modules and the lucene namespace for others.

Let's eliminate a source of conflict, and make modules another product that is neither Lucene nor Solr.

Steve

Re: modularization discussion

Posted by Mark Miller <ma...@gmail.com>.

On May 2, 2011, at 7:31 PM, Ryan McKinley wrote:

> 
> In short, I believe people should still contribute where they see they can add the most value and according to their time schedules.  Additionally, others who have more time or the ability to refactor for reusability should be free to do so as well.
> 
> I agree that people should be able to contribute where they can; at the same time as a single unified project (lucene+solr) I think there is an objective 'right' place for things -- code designed to have maximum utility and reusablity (minimum dependencies without sacrificing functionality).
> 
> Starting things in the right place is often easier then refactoring later -- that said, i don't think it should be a requirement as long as we all agree that things can (and should) be moved to a more reusable place if someone is willing to do the work.
> 
> Thinking about the issue that triggered this debate... in SOLR-2272 (the pseudo-join stuff), I think the heart of the problem was the idea that once committed, this new feature could not be moved around.  With this discussion, I think we agree that it should be refactored if someone is willing to do the work.  It may even be reasonable for someone to mark it as @lucene.experimental if there is serious concern about how hard it is to refactor (and that person is planning to put in some effort to move things in the right direction)
> 
> ryan
> 

+1

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Ryan McKinley <ry...@gmail.com>.

>
>
> In short, I believe people should still contribute where they see they can
> add the most value and according to their time schedules.  Additionally,
> others who have more time or the ability to refactor for reusability should
> be free to do so as well.
>

I agree that people should be able to contribute where they can; at the same
time as a single unified project (lucene+solr) I think there is an objective
'right' place for things -- code designed to have maximum utility and
reusablity (minimum dependencies without sacrificing functionality).

Starting things in the right place is often easier then refactoring later --
that said, i don't think it should be a requirement as long as we all agree
that things can (and should) be moved to a more reusable place if someone is
willing to do the work.

Thinking about the issue that triggered this debate... in SOLR-2272 (the
pseudo-join stuff), I think the heart of the problem was the idea that once
committed, this new feature could not be moved around.  With this
discussion, I think we agree that it should be refactored if someone is
willing to do the work.  It may even be reasonable for someone to mark it as
@lucene.experimental if there is serious concern about how hard it is to
refactor (and that person is planning to put in some effort to move things
in the right direction)

ryan

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

On the namespace, since Yonik seems concerned about it, and others
aren't (I think?), why don't we leave everything factored out of Solr
under the under org.apache.solr namespace?

Anyone object to that approach?

My only concern is that this sends the message that the module depends
on Solr.... but, this turns into a non-issue once Solr is well
factored into modules, because by the time we arrive at that future,
"depending on Solr" just means "depending on Solr modules", which
resolves my concern!

Mike

http://blog.mikemccandless.com

On Mon, May 2, 2011 at 6:11 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Apr 27, 2011, at 11:45 PM, Greg Stein wrote:
>
>> On Wed, Apr 27, 2011 at 09:25:14AM -0400, Yonik Seeley wrote:
>>> ...
>>> But as I said... it seems only fair to meet half way and use the solr namespace
>>> for some modules and the lucene namespace for others.
>>
>> Please explain this part to me... I really don't understand.
>
> At the risk of speaking for someone else, I think it has to do w/ wanting to maintain brand awareness for Solr.  We, as the PMC, currently produce two products:  Apache Lucene and Apache Solr.  I believe Yonik's concern is that if everything is just labeled Lucene, then Solr is just seen as a very thin shell around Lucene (which, IMO, would still not be the case, since wiring together a server app like Solr is non-trivial, but that is my opinion and I'm not sure if Yonik share's it).  Solr has never been a thin shell around Lucene and never will be.   However, In some ways, this gets at why I believe Yonik was interested in a Solr TLP: so that Solr could stand on it's own as a brand and as a first class Apache product steered by a PMC that is aligned solely w/ producing the Solr (i.e. as a TLP) product as opposed to the two products we produce now.  (Note, my vote on such a TLP was -1, so please don't confuse me as arguing for the point, I'm just trying to, hopefully, explain it)
>
> That being said, 99% of consumers of Solr never even know what is in the underlying namespace b/c they only ever interact w/ Solr via HTTP (which has solr in the namespace by default) at the server API level, so at least in my mind, I don't care what the namespace used underneath is.  Call it lusolr for all I care.
>
>>
>> What does "fairness" have to do with the codebase?
>
> I can't speak to this, but perhaps it's just the wrong choice of words and would have been better said: please don't take this as a reason to gut Solr and call everything Lucene.
>
>> Isn't the whole
>> point of the Lucene project to create the best code possible, for the
>> benefit of our worldwide users?
>
> It is.  We do that primarily through the release of two products: Lucene and Solr.  Lucene is a Java class library.  A good deal of programming is required to create anything meaningful in terms of a production ready search server.  Solr is a server that takes and makes most things that are programming tasks in Lucene configuration tasks as well as adds a fair bit of functionality (distributed search, replication, faceting, auto-suggest, etc.) and is thus that much easier to put in production (I've seen people be in production on Solr in a matter of days/weeks, I've never seen that in Lucene)  The crux of this debate is whether these additional pieces are better served as modules (I think they are) or tightly coupled inside of Solr (which does have a few benefits from a dev. point of view, even though I firmly believe they are outweighed by the positives of modularization.)    And, while I think most of us agree that modularization makes sense, that doesn't mean there aren't reasons against it.  I also believe we need to take it on a case by case basis.  I also don't think every patch has to be in it's final place on first commit.  As Otis so often says, it's just software.  If it doesn't work, change it.  Thus, if people contribute and it lands in Solr, the committer who commits it need not immediately move it (although, hopefully they will) or ask the contributor to do so, as that will likely dampen contributions.  Likewise for Lucene.  Along with that, if and when others wish to refactor, then they should by all means be allowed to do so assuming of course, all tests across both products still pass.
>
> In short, I believe people should still contribute where they see they can add the most value and according to their time schedules.  Additionally, others who have more time or the ability to refactor for reusability should be free to do so as well.
>
> I don't know what the outcome of this thread should be, so I guess we need to just move forward and keep coding away and working to make things better.  Do others see anything broader here?  A vote?  That would be symbolic, I guess, but doesn't force anyone to do anything since there isn't a specific issue at hand other than a broad concept that is seen as "good".
>
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 27, 2011, at 11:45 PM, Greg Stein wrote:

> On Wed, Apr 27, 2011 at 09:25:14AM -0400, Yonik Seeley wrote:
>> ...
>> But as I said... it seems only fair to meet half way and use the solr namespace
>> for some modules and the lucene namespace for others.
> 
> Please explain this part to me... I really don't understand.

At the risk of speaking for someone else, I think it has to do w/ wanting to maintain brand awareness for Solr.  We, as the PMC, currently produce two products:  Apache Lucene and Apache Solr.  I believe Yonik's concern is that if everything is just labeled Lucene, then Solr is just seen as a very thin shell around Lucene (which, IMO, would still not be the case, since wiring together a server app like Solr is non-trivial, but that is my opinion and I'm not sure if Yonik share's it).  Solr has never been a thin shell around Lucene and never will be.   However, In some ways, this gets at why I believe Yonik was interested in a Solr TLP: so that Solr could stand on it's own as a brand and as a first class Apache product steered by a PMC that is aligned solely w/ producing the Solr (i.e. as a TLP) product as opposed to the two products we produce now.  (Note, my vote on such a TLP was -1, so please don't confuse me as arguing for the point, I'm just trying to, hopefully, explain it)

That being said, 99% of consumers of Solr never even know what is in the underlying namespace b/c they only ever interact w/ Solr via HTTP (which has solr in the namespace by default) at the server API level, so at least in my mind, I don't care what the namespace used underneath is.  Call it lusolr for all I care.

> 
> What does "fairness" have to do with the codebase?

I can't speak to this, but perhaps it's just the wrong choice of words and would have been better said: please don't take this as a reason to gut Solr and call everything Lucene.

> Isn't the whole
> point of the Lucene project to create the best code possible, for the
> benefit of our worldwide users?

It is.  We do that primarily through the release of two products: Lucene and Solr.  Lucene is a Java class library.  A good deal of programming is required to create anything meaningful in terms of a production ready search server.  Solr is a server that takes and makes most things that are programming tasks in Lucene configuration tasks as well as adds a fair bit of functionality (distributed search, replication, faceting, auto-suggest, etc.) and is thus that much easier to put in production (I've seen people be in production on Solr in a matter of days/weeks, I've never seen that in Lucene)  The crux of this debate is whether these additional pieces are better served as modules (I think they are) or tightly coupled inside of Solr (which does have a few benefits from a dev. point of view, even though I firmly believe they are outweighed by the positives of modularization.)    And, while I think most of us agree that modularization makes sense, that doesn't mean there aren't reasons against it.  I also believe we need to take it on a case by case basis.  I also don't think every patch has to be in it's final place on first commit.  As Otis so often says, it's just software.  If it doesn't work, change it.  Thus, if people contribute and it lands in Solr, the committer who commits it need not immediately move it (although, hopefully they will) or ask the contributor to do so, as that will likely dampen contributions.  Likewise for Lucene.  Along with that, if and when others wish to refactor, then they should by all means be allowed to do so assuming of course, all tests across both products still pass.

In short, I believe people should still contribute where they see they can add the most value and according to their time schedules.  Additionally, others who have more time or the ability to refactor for reusability should be free to do so as well.  

I don't know what the outcome of this thread should be, so I guess we need to just move forward and keep coding away and working to make things better.  Do others see anything broader here?  A vote?  That would be symbolic, I guess, but doesn't force anyone to do anything since there isn't a specific issue at hand other than a broad concept that is seen as "good".

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Greg Stein <gs...@gmail.com>.

On Wed, Apr 27, 2011 at 09:25:14AM -0400, Yonik Seeley wrote:
>...
> But as I said... it seems only fair to meet half way and use the solr namespace
> for some modules and the lucene namespace for others.

Please explain this part to me... I really don't understand.

What does "fairness" have to do with the codebase? Isn't the whole
point of the Lucene project to create the best code possible, for the
benefit of our worldwide users?

How does the concept of "fairness" fit into that?

Cheers,
-g

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Why impose namespace restrictions based on where code was originally
> committed?  I think the namespace of refactored code should reflect
> the nature of the code, not its original origins?

And if it's a very core part of solr that we've tended to hang a lot of
new features on, etc, then the nature of that code should still
hopefully be "solrish".

> For example, when I refactored UnInvertedField, it split nicely into a
> Solr piece and a core Lucene piece, and so I gave the core Lucene
> piece then org.apache.lucene.index namespace.

That's because it was factored directly into Lucene-core, not into a module.

> I think leaving refactored code in the solr namespace sends the wrong
> message (ie, that this module "depends" on Solr somehow).  The lucene
> namespace makes it clear that it only depends on Lucene.

But that won't be true... it's likely that many modules will depend on other
modules.

But as I said... it seems only fair to meet half way and use the solr namespace
for some modules and the lucene namespace for others.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 26, 2011 at 11:34 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, Apr 26, 2011 at 11:07 PM, Robert Muir <rc...@gmail.com> wrote:
>> It appears there are some problems with modularization of the code,
>> especially between lucene and solr, so I would like for us to have a
>> discussion on this thread.
>
> The specifics of each case matter of course.

I agree.

> Some of the refactored
> code has been changed to use the lucene namespace, and it
> seems only fair that other code that has traditionally been the
> domain of Solr keep the solr namespace. This helps
> keep the proper mindset that code is not being moved "from
> solr to lucene" as too many people keep putting it, but it's being
> exposed to lucene users and is now shared.

Why impose namespace restrictions based on where code was originally
committed?  I think the namespace of refactored code should reflect
the nature of the code, not its original origins?

For example, when I refactored UnInvertedField, it split nicely into a
Solr piece and a core Lucene piece, and so I gave the core Lucene
piece then org.apache.lucene.index namespace.

I think leaving refactored code in the solr namespace sends the wrong
message (ie, that this module "depends" on Solr somehow).  The lucene
namespace makes it clear that it only depends on Lucene.

Eg, the patch on LUCENE-2995 (consolidating our various spell/suggest
impls) also consolidates everything under the lucene namespace, which
I think makes sense?

Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: modularization discussion

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Apr 26, 2011 at 11:07 PM, Robert Muir <rc...@gmail.com> wrote:
> It appears there are some problems with modularization of the code,
> especially between lucene and solr, so I would like for us to have a
> discussion on this thread.

The specifics of each case matter of course.

As a general point, lucene and solr merged as equals, and
the domain of neither project was diminished by this.  Refactored
code is shared code and goes into modules.  Some of the refactored
code has been changed to use the lucene namespace, and it
seems only fair that other code that has traditionally been the
domain of Solr keep the solr namespace.  This helps
keep the proper mindset that code is not being moved "from
solr to lucene" as too many people keep putting it, but it's being
exposed to lucene users and is now shared.

As this relates to LUCENE-2883, I don't think there's really
much disagreement.  Factor out a function query module,
and make it a Solr module (i.e. use the solr namespace) to
set expectations appropriately, since this is still
core solr code.  All the function query goodness is
exposed to lucene users w/o the full solr stack, and everyone is good!
Someone does need to do the work though... there's been no patch.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org