You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Robert Muir <rc...@gmail.com> on 2010/03/01 17:12:37 UTC

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

this will make the analyzers duplication problem even worse

On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Mark,
>
> Thanks for your message. I respect your viewpoint, but I respectfully
> disagree. It just seems (to me at least based on the discussion) like a TLP
> for Solr is the way to go.
>
> Cheers,
> Chris
>
>
>
> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>
> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
> > Hi Mark,
> >
> >
> >> That would really be no real world change from how things work today.
> The fact
> >> is, today, Solr already operates essentially as an independent project.
> >>
> > Well if that's the case, then it would lead me to think that it's more of
> a
> > TLP more than anything else per best practices.
> >
> That depends. It could be argued it should be a top level project or
> that it should be closer to the Lucene project. Some people are arguing
> for both approaches right now. There are two directions we could move in.
> >
> >> The only real difference is that it shares the same PMC with Lucene now
> and
> >> wouldn't with this change. This would address none of the issues that
> >> triggered
> >> the idea for a possible merge.
> >>
> > I don't agree -- you're looking to bring together two communities that
> are
> > "fairly separate" as you put it. The separation likely didn't spring up
> over
> > night and has been this way for a while (as least to my knowledge). This
> is
> > exactly the type of situation that typically leads to TLP creation from
> what
> > I've seen.
> >
> It also causes negatives between Solr/Lucene that some are looking to
> address. Hence the birth of this proposal. Going TLP with Solr will only
> aggravate those negatives, not help them.
>
> While the communities operate fairly separately at the moment, the
> people in the communities are not so separate. The committer list has
> huge overlap. Many committers on one project but not the other do a lot
> of work on both projects.
>
> There is already a strong link with the personal - merging the
> management of the projects addresses many of the concerns that have
> prompted this discussion. TLP'ing Solr only makes those concerns
> multiply. They would diverge further, and incompatible overlap between
> them would increase.
>
> > Cheers,
> > Chris
> >
> >
> >
> >
> >>
> >>
> >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
> >>
> >>> Hey Grant,
> >>>
> >>> I'd like to explore this<   does this imply that the Lucene
> sub-projects will
> >>> go away and Lucene will turn into Lucene-java and maintain its Apache
> TLP,
> >>> and then you'd have say, solr.apache.org, tika.apache.org,
> mahout.apache.org
> >>> (already started), etc. etc.? If so, that may be the best of all
> worlds,
> >>> allowing project independence, but also not following the Apache
> >>> "antipattern" as Doug put it...
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>>
> >>>
> >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
> >>>
> >>>
> >>>
> >>>> Also, as Doug alluded to, the Board is likely to ask us to consider
> less
> >>>> subprojects in the future, so we may be consolidating and spinning off
> >>>> anyway.
> >>>>
> >>>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: Chris.Mattmann@jpl.nasa.gov
> >>> Phone: +1 (818) 354-8810
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: Chris.Mattmann@jpl.nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Steve,

Thanks! Yep we just started, and got our mailing lists set up after the positive Incubation vote. You can read the project proposal here:

http://wiki.apache.org/incubator/SpatialProposal

Cheers,
Chris



On 3/1/10 11:41 AM, "Steven A Rowe" <sa...@syr.edu> wrote:

Hi Chris,

On 03/01/2010 at 1:28 PM, Mattmann, Chris A (388J) wrote:
> http://incubator.apache.org/projects/sis.html
>
> We're just starting to tackle that very issue right
> now...patches/ideas/contributions welcome.

Patches?  SVN <https://svn.apache.org/repos/asf/incubator/sis/> looks empty ATM:

        asf - Revision 917638: /incubator/sis

            * ..

        Powered by Subversion version 1.6.9 (r901367).

Also, the website <http://incubator.apache.org/sis/> doesn't seem to exist?:

        Not Found

        The requested URL /sis/ was not found on this server.
        Apache/2.3.5 (Unix) mod_ssl/2.3.5 OpenSSL/0.9.7d
         mod_fcgid/2.3.2-dev Server at incubator.apache.org Port 80

Steve




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


RE: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Chris,

On 03/01/2010 at 1:28 PM, Mattmann, Chris A (388J) wrote:
> http://incubator.apache.org/projects/sis.html
> 
> We're just starting to tackle that very issue right
> now...patches/ideas/contributions welcome.

Patches?  SVN <https://svn.apache.org/repos/asf/incubator/sis/> looks empty ATM:

	asf - Revision 917638: /incubator/sis

	    * ..

	Powered by Subversion version 1.6.9 (r901367).

Also, the website <http://incubator.apache.org/sis/> doesn't seem to exist?:

	Not Found

	The requested URL /sis/ was not found on this server.
	Apache/2.3.5 (Unix) mod_ssl/2.3.5 OpenSSL/0.9.7d
	 mod_fcgid/2.3.2-dev Server at incubator.apache.org Port 80

Steve


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by patrick o'leary <pj...@pjaol.com>.
Here's my view on it..

Developing GIS support for lucene took a little bit of time and patience and
a couple of iterations from a basic concept to get buy to spend more time
working on it, to an OMG this does what we need, build more more more...

The lucene version of this was easy enough to support, however Solr support
was a different kettle of fish.
>From really crude duplication or query handlers and write templates to
inject distance features to today where it's a little more componentised,
but still a little crude unless you want to cut back on scalability and
functionality by using function queries.

I guess my point is that Solr has always required more effort, and solutions
that constantly drove me further away from the initial lucene
implementation.

In my mind if I make something work in lucene, it should be easy to just
'plug-in' to Solr, but that is definitely not the case, leaf index readers,
NumericalUtils, Trie all came at major development costs that were not
present in lucene development.

The spatial efforts going on in Solr, who knows if they will make it back to
lucene, but at the same time has this gap between both systems grown to the
point that porting is not a worthwhile effort?

I honestly don't want to maintain both systems, but find that to allow for
solr support I have to do a lot more "hacking"




On Mon, Mar 1, 2010 at 10:46 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This looks great!
>
> But, the goal is to make a standalone toolkit exposing GIS functions,
> right?
>
> My original question (integrating this into Lucene/Solr) remains.
>
> EG there's alot of good working happening now in Solr to make spatial
> search available.  How will that find its way back to Lucene?  Lucene
> has its own (now duplicate) spatial package that was already
> developed.  Users will now be confused about the two, each have
> different bugs/features, etc.
>
> If we had shared development then the ongoing effort would result in a
> spatial package that direct Lucene users and Solr users would be able
> to use.
>
> Mike
>
> On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
> > I'm glad that you brought that up! :)
> >
> > Check out:
> >
> > http://incubator.apache.org/projects/sis.html
> >
> > We're just starting to tackle that very issue right
> now...patches/ideas/contributions welcome.
> >
> > Cheers,
> > Chris
> >
> >
> >
> > On 3/1/10 11:25 AM, "Michael McCandless" <lu...@mikemccandless.com>
> wrote:
> >
> > Because the code dup with analyzers is only one of the problems to
> > solve.  In fact, it's the easiest of the problems to solve (that's why
> > I proposed it, only, first).
> >
> > A more differentiating example is a much less mature module....
> >
> > EG take spatial -- if Solr were its own TLP, how could spatial be
> > built out in a way that we don't waste effort, and so that both direct
> > Lucene and Solr users could use it when it's released?
> >
> > Mike
> >
> > On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
> > <ch...@jpl.nasa.gov> wrote:
> >> Hi Mike,
> >>
> >> I'm not sure I follow this line of thinking: how would Solr being a TLP
> affect the creation of a separate project/module for Analyzers any more so
> than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend
> on the newly created refactored Analysis project.
> >>
> >> Chris
> >>
> >>
> >>
> >> On 3/1/10 10:44 AM, "Michael McCandless" <lu...@mikemccandless.com>
> wrote:
> >>
> >> If we don't somehow first address the code duplication across the 2
> >> projects, making Solr a TLP will make things worse.
> >>
> >> I started here with analysis because I think that's the biggest pain
> >> point: it seemed like an obvious first step to fixing the code
> >> duplication and thus the most likely to reach some consensus.  And
> >> it's also very timely: Robert is right now making all kinds of great
> >> fixes to our collective analyzers (in between bouts of fuzzy DFA
> >> debugging).
> >>
> >> But it goes beyond analyzers: I'd like to see other modules, now in
> >> Solr, eventually moved to Lucene, because they really are "core"
> >> functionality (eg facets, function (and other?) queries, spatial,
> >> maybe improvements to spellchecker/highlighter).  How can we do this?
> >>
> >> And how can we do this so that it "lasts" over time?  If new cool
> >> "core" things are born in Solr-land (which of course happens alot --
> >> lots of good healthy usage), how will they find their way back to
> >> Lucene?
> >>
> >> Yonik's proposal (merging development of Solr/Lucene, but keeping all
> >> else separate) would achieve this.
> >>
> >> If we do the opposite (Solr -> TLP), how could we possibly achieve
> >> this?
> >>
> >> I guess one possibility is to just suck it up and duplicate the code.
> >> Meaning, each project will have to manually merge fixes in from the
> >> other project (so long as there's someone around with the itch to do
> >> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
> >> likewise other dup'd functionality).  I really dislike this
> >> solution... it will confuse the daylights out of users, its error
> >> proned, it's a waste of dev effort, there will always be little
> >> differences... but maybe it is in fact the lesser evil?
> >>
> >> I would much prefer merging Solr/Lucene development...
> >>
> >> Mike
> >>
> >> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
> >> <ch...@jpl.nasa.gov> wrote:
> >>> Hi Grant,
> >>>
> >>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
> >>>>
> >>>>> Hi Robert,
> >>>>>
> >>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole
> analyzers
> >>>>> issue - I was in favor, at the very least, of having a separate
> >>>>> module/project/whatever that both Solr/Lucene (and whatever project)
> can
> >>>>> depend on for the shared analyzer code...
> >>>>
> >>>> Not really.  They are intimately linked.
> >>>
> >>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
> >>> making Lucene(-java) and Solr depend on Apache Super Analyzers is
> separate
> >>> of whether or not Lucene(-java) and Solr are TLPs or not...
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
> >>>>>
> >>>>> this will make the analyzers duplication problem even worse
> >>>>>
> >>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
> >>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>>>>
> >>>>>> Hi Mark,
> >>>>>>
> >>>>>> Thanks for your message. I respect your viewpoint, but I
> respectfully
> >>>>>> disagree. It just seems (to me at least based on the discussion)
> like a TLP
> >>>>>> for Solr is the way to go.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Chris
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
> >>>>>>
> >>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
> >>>>>>> Hi Mark,
> >>>>>>>
> >>>>>>>
> >>>>>>>> That would really be no real world change from how things work
> today.
> >>>>>> The fact
> >>>>>>>> is, today, Solr already operates essentially as an independent
> project.
> >>>>>>>>
> >>>>>>> Well if that's the case, then it would lead me to think that it's
> more of
> >>>>>> a
> >>>>>>> TLP more than anything else per best practices.
> >>>>>>>
> >>>>>> That depends. It could be argued it should be a top level project or
> >>>>>> that it should be closer to the Lucene project. Some people are
> arguing
> >>>>>> for both approaches right now. There are two directions we could
> move in.
> >>>>>>>
> >>>>>>>> The only real difference is that it shares the same PMC with
> Lucene now
> >>>>>> and
> >>>>>>>> wouldn't with this change. This would address none of the issues
> that
> >>>>>>>> triggered
> >>>>>>>> the idea for a possible merge.
> >>>>>>>>
> >>>>>>> I don't agree -- you're looking to bring together two communities
> that
> >>>>>> are
> >>>>>>> "fairly separate" as you put it. The separation likely didn't
> spring up
> >>>>>> over
> >>>>>>> night and has been this way for a while (as least to my knowledge).
> This
> >>>>>> is
> >>>>>>> exactly the type of situation that typically leads to TLP creation
> from
> >>>>>> what
> >>>>>>> I've seen.
> >>>>>>>
> >>>>>> It also causes negatives between Solr/Lucene that some are looking
> to
> >>>>>> address. Hence the birth of this proposal. Going TLP with Solr will
> only
> >>>>>> aggravate those negatives, not help them.
> >>>>>>
> >>>>>> While the communities operate fairly separately at the moment, the
> >>>>>> people in the communities are not so separate. The committer list
> has
> >>>>>> huge overlap. Many committers on one project but not the other do a
> lot
> >>>>>> of work on both projects.
> >>>>>>
> >>>>>> There is already a strong link with the personal - merging the
> >>>>>> management of the projects addresses many of the concerns that have
> >>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
> >>>>>> multiply. They would diverge further, and incompatible overlap
> between
> >>>>>> them would increase.
> >>>>>>
> >>>>>>> Cheers,
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
> >>>>>>>>
> >>>>>>>>> Hey Grant,
> >>>>>>>>>
> >>>>>>>>> I'd like to explore this<   does this imply that the Lucene
> >>>>>> sub-projects will
> >>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its
> Apache
> >>>>>> TLP,
> >>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
> >>>>>> mahout.apache.org
> >>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
> >>>>>> worlds,
> >>>>>>>>> allowing project independence, but also not following the Apache
> >>>>>>>>> "antipattern" as Doug put it...
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>
> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to
> consider
> >>>>>> less
> >>>>>>>>>> subprojects in the future, so we may be consolidating and
> spinning off
> >>>>>>>>>> anyway.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>> Chris Mattmann, Ph.D.
> >>>>>>>>> Senior Computer Scientist
> >>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>>>>>>> Phone: +1 (818) 354-8810
> >>>>>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> - Mark
> >>>>>>>>
> >>>>>>>> http://www.lucidimagination.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> Chris Mattmann, Ph.D.
> >>>>>>> Senior Computer Scientist
> >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<
> http://sunset.usc.edu/%7Emattmann/>
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> - Mark
> >>>>>>
> >>>>>> http://www.lucidimagination.com
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> Chris Mattmann, Ph.D.
> >>>>>> Senior Computer Scientist
> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>>>> WWW:   http://sunset.usc.edu/~mattmann/<
> http://sunset.usc.edu/%7Emattmann/>
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Robert Muir
> >>>>> rcmuir@gmail.com
> >>>>>
> >>>>>
> >>>>>
> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> Chris Mattmann, Ph.D.
> >>>>> Senior Computer Scientist
> >>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>> Office: 171-266B, Mailstop: 171-246
> >>>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>>> WWW:   http://sunset.usc.edu/~mattmann/
> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: Chris.Mattmann@jpl.nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: Chris.Mattmann@jpl.nasa.gov
> >> WWW:   http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: Chris.Mattmann@jpl.nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
This looks great!

But, the goal is to make a standalone toolkit exposing GIS functions,
right?

My original question (integrating this into Lucene/Solr) remains.

EG there's alot of good working happening now in Solr to make spatial
search available.  How will that find its way back to Lucene?  Lucene
has its own (now duplicate) spatial package that was already
developed.  Users will now be confused about the two, each have
different bugs/features, etc.

If we had shared development then the ongoing effort would result in a
spatial package that direct Lucene users and Solr users would be able
to use.

Mike

On Mon, Mar 1, 2010 at 1:28 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> I'm glad that you brought that up! :)
>
> Check out:
>
> http://incubator.apache.org/projects/sis.html
>
> We're just starting to tackle that very issue right now...patches/ideas/contributions welcome.
>
> Cheers,
> Chris
>
>
>
> On 3/1/10 11:25 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:
>
> Because the code dup with analyzers is only one of the problems to
> solve.  In fact, it's the easiest of the problems to solve (that's why
> I proposed it, only, first).
>
> A more differentiating example is a much less mature module....
>
> EG take spatial -- if Solr were its own TLP, how could spatial be
> built out in a way that we don't waste effort, and so that both direct
> Lucene and Solr users could use it when it's released?
>
> Mike
>
> On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hi Mike,
>>
>> I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project.
>>
>> Chris
>>
>>
>>
>> On 3/1/10 10:44 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:
>>
>> If we don't somehow first address the code duplication across the 2
>> projects, making Solr a TLP will make things worse.
>>
>> I started here with analysis because I think that's the biggest pain
>> point: it seemed like an obvious first step to fixing the code
>> duplication and thus the most likely to reach some consensus.  And
>> it's also very timely: Robert is right now making all kinds of great
>> fixes to our collective analyzers (in between bouts of fuzzy DFA
>> debugging).
>>
>> But it goes beyond analyzers: I'd like to see other modules, now in
>> Solr, eventually moved to Lucene, because they really are "core"
>> functionality (eg facets, function (and other?) queries, spatial,
>> maybe improvements to spellchecker/highlighter).  How can we do this?
>>
>> And how can we do this so that it "lasts" over time?  If new cool
>> "core" things are born in Solr-land (which of course happens alot --
>> lots of good healthy usage), how will they find their way back to
>> Lucene?
>>
>> Yonik's proposal (merging development of Solr/Lucene, but keeping all
>> else separate) would achieve this.
>>
>> If we do the opposite (Solr -> TLP), how could we possibly achieve
>> this?
>>
>> I guess one possibility is to just suck it up and duplicate the code.
>> Meaning, each project will have to manually merge fixes in from the
>> other project (so long as there's someone around with the itch to do
>> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
>> likewise other dup'd functionality).  I really dislike this
>> solution... it will confuse the daylights out of users, its error
>> proned, it's a waste of dev effort, there will always be little
>> differences... but maybe it is in fact the lesser evil?
>>
>> I would much prefer merging Solr/Lucene development...
>>
>> Mike
>>
>> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
>> <ch...@jpl.nasa.gov> wrote:
>>> Hi Grant,
>>>
>>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>>>> issue - I was in favor, at the very least, of having a separate
>>>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>>>> depend on for the shared analyzer code...
>>>>
>>>> Not really.  They are intimately linked.
>>>
>>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>>>>>
>>>>> this will make the analyzers duplication problem even worse
>>>>>
>>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>>>> for Solr is the way to go.
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>>>>>
>>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>>
>>>>>>>> That would really be no real world change from how things work today.
>>>>>> The fact
>>>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>>>
>>>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>>>> a
>>>>>>> TLP more than anything else per best practices.
>>>>>>>
>>>>>> That depends. It could be argued it should be a top level project or
>>>>>> that it should be closer to the Lucene project. Some people are arguing
>>>>>> for both approaches right now. There are two directions we could move in.
>>>>>>>
>>>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>>>> and
>>>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>>>> triggered
>>>>>>>> the idea for a possible merge.
>>>>>>>>
>>>>>>> I don't agree -- you're looking to bring together two communities that
>>>>>> are
>>>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>>>> over
>>>>>>> night and has been this way for a while (as least to my knowledge). This
>>>>>> is
>>>>>>> exactly the type of situation that typically leads to TLP creation from
>>>>>> what
>>>>>>> I've seen.
>>>>>>>
>>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>>>> aggravate those negatives, not help them.
>>>>>>
>>>>>> While the communities operate fairly separately at the moment, the
>>>>>> people in the communities are not so separate. The committer list has
>>>>>> huge overlap. Many committers on one project but not the other do a lot
>>>>>> of work on both projects.
>>>>>>
>>>>>> There is already a strong link with the personal - merging the
>>>>>> management of the projects addresses many of the concerns that have
>>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>>> them would increase.
>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>>
>>>>>>>>> Hey Grant,
>>>>>>>>>
>>>>>>>>> I'd like to explore this<   does this imply that the Lucene
>>>>>> sub-projects will
>>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>>>> TLP,
>>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>>> mahout.apache.org
>>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>>>> worlds,
>>>>>>>>> allowing project independence, but also not following the Apache
>>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>>>> less
>>>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>>>> anyway.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>>> Senior Computer Scientist
>>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> - Mark
>>>>>>>>
>>>>>>>> http://www.lucidimagination.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcmuir@gmail.com
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>
>>>>
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
I'm glad that you brought that up! :)

Check out:

http://incubator.apache.org/projects/sis.html

We're just starting to tackle that very issue right now...patches/ideas/contributions welcome.

Cheers,
Chris



On 3/1/10 11:25 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:

Because the code dup with analyzers is only one of the problems to
solve.  In fact, it's the easiest of the problems to solve (that's why
I proposed it, only, first).

A more differentiating example is a much less mature module....

EG take spatial -- if Solr were its own TLP, how could spatial be
built out in a way that we don't waste effort, and so that both direct
Lucene and Solr users could use it when it's released?

Mike

On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hi Mike,
>
> I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project.
>
> Chris
>
>
>
> On 3/1/10 10:44 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:
>
> If we don't somehow first address the code duplication across the 2
> projects, making Solr a TLP will make things worse.
>
> I started here with analysis because I think that's the biggest pain
> point: it seemed like an obvious first step to fixing the code
> duplication and thus the most likely to reach some consensus.  And
> it's also very timely: Robert is right now making all kinds of great
> fixes to our collective analyzers (in between bouts of fuzzy DFA
> debugging).
>
> But it goes beyond analyzers: I'd like to see other modules, now in
> Solr, eventually moved to Lucene, because they really are "core"
> functionality (eg facets, function (and other?) queries, spatial,
> maybe improvements to spellchecker/highlighter).  How can we do this?
>
> And how can we do this so that it "lasts" over time?  If new cool
> "core" things are born in Solr-land (which of course happens alot --
> lots of good healthy usage), how will they find their way back to
> Lucene?
>
> Yonik's proposal (merging development of Solr/Lucene, but keeping all
> else separate) would achieve this.
>
> If we do the opposite (Solr -> TLP), how could we possibly achieve
> this?
>
> I guess one possibility is to just suck it up and duplicate the code.
> Meaning, each project will have to manually merge fixes in from the
> other project (so long as there's someone around with the itch to do
> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
> likewise other dup'd functionality).  I really dislike this
> solution... it will confuse the daylights out of users, its error
> proned, it's a waste of dev effort, there will always be little
> differences... but maybe it is in fact the lesser evil?
>
> I would much prefer merging Solr/Lucene development...
>
> Mike
>
> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hi Grant,
>>
>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>>> issue - I was in favor, at the very least, of having a separate
>>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>>> depend on for the shared analyzer code...
>>>
>>> Not really.  They are intimately linked.
>>
>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>
>> Cheers,
>> Chris
>>
>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>>>>
>>>> this will make the analyzers duplication problem even worse
>>>>
>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>>> for Solr is the way to go.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>>>>
>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>>
>>>>>>> That would really be no real world change from how things work today.
>>>>> The fact
>>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>>
>>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>>> a
>>>>>> TLP more than anything else per best practices.
>>>>>>
>>>>> That depends. It could be argued it should be a top level project or
>>>>> that it should be closer to the Lucene project. Some people are arguing
>>>>> for both approaches right now. There are two directions we could move in.
>>>>>>
>>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>>> and
>>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>>> triggered
>>>>>>> the idea for a possible merge.
>>>>>>>
>>>>>> I don't agree -- you're looking to bring together two communities that
>>>>> are
>>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>>> over
>>>>>> night and has been this way for a while (as least to my knowledge). This
>>>>> is
>>>>>> exactly the type of situation that typically leads to TLP creation from
>>>>> what
>>>>>> I've seen.
>>>>>>
>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>>> aggravate those negatives, not help them.
>>>>>
>>>>> While the communities operate fairly separately at the moment, the
>>>>> people in the communities are not so separate. The committer list has
>>>>> huge overlap. Many committers on one project but not the other do a lot
>>>>> of work on both projects.
>>>>>
>>>>> There is already a strong link with the personal - merging the
>>>>> management of the projects addresses many of the concerns that have
>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>> them would increase.
>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>
>>>>>>>> Hey Grant,
>>>>>>>>
>>>>>>>> I'd like to explore this<   does this imply that the Lucene
>>>>> sub-projects will
>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>>> TLP,
>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>> mahout.apache.org
>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>>> worlds,
>>>>>>>> allowing project independence, but also not following the Apache
>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>>> less
>>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>>> anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>> Senior Computer Scientist
>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>
>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Because the code dup with analyzers is only one of the problems to
solve.  In fact, it's the easiest of the problems to solve (that's why
I proposed it, only, first).

A more differentiating example is a much less mature module....

EG take spatial -- if Solr were its own TLP, how could spatial be
built out in a way that we don't waste effort, and so that both direct
Lucene and Solr users could use it when it's released?

Mike

On Mon, Mar 1, 2010 at 1:07 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hi Mike,
>
> I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project.
>
> Chris
>
>
>
> On 3/1/10 10:44 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:
>
> If we don't somehow first address the code duplication across the 2
> projects, making Solr a TLP will make things worse.
>
> I started here with analysis because I think that's the biggest pain
> point: it seemed like an obvious first step to fixing the code
> duplication and thus the most likely to reach some consensus.  And
> it's also very timely: Robert is right now making all kinds of great
> fixes to our collective analyzers (in between bouts of fuzzy DFA
> debugging).
>
> But it goes beyond analyzers: I'd like to see other modules, now in
> Solr, eventually moved to Lucene, because they really are "core"
> functionality (eg facets, function (and other?) queries, spatial,
> maybe improvements to spellchecker/highlighter).  How can we do this?
>
> And how can we do this so that it "lasts" over time?  If new cool
> "core" things are born in Solr-land (which of course happens alot --
> lots of good healthy usage), how will they find their way back to
> Lucene?
>
> Yonik's proposal (merging development of Solr/Lucene, but keeping all
> else separate) would achieve this.
>
> If we do the opposite (Solr -> TLP), how could we possibly achieve
> this?
>
> I guess one possibility is to just suck it up and duplicate the code.
> Meaning, each project will have to manually merge fixes in from the
> other project (so long as there's someone around with the itch to do
> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
> likewise other dup'd functionality).  I really dislike this
> solution... it will confuse the daylights out of users, its error
> proned, it's a waste of dev effort, there will always be little
> differences... but maybe it is in fact the lesser evil?
>
> I would much prefer merging Solr/Lucene development...
>
> Mike
>
> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hi Grant,
>>
>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>>> issue - I was in favor, at the very least, of having a separate
>>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>>> depend on for the shared analyzer code...
>>>
>>> Not really.  They are intimately linked.
>>
>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>
>> Cheers,
>> Chris
>>
>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>>>>
>>>> this will make the analyzers duplication problem even worse
>>>>
>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>>> for Solr is the way to go.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>>>>
>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>>
>>>>>>> That would really be no real world change from how things work today.
>>>>> The fact
>>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>>
>>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>>> a
>>>>>> TLP more than anything else per best practices.
>>>>>>
>>>>> That depends. It could be argued it should be a top level project or
>>>>> that it should be closer to the Lucene project. Some people are arguing
>>>>> for both approaches right now. There are two directions we could move in.
>>>>>>
>>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>>> and
>>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>>> triggered
>>>>>>> the idea for a possible merge.
>>>>>>>
>>>>>> I don't agree -- you're looking to bring together two communities that
>>>>> are
>>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>>> over
>>>>>> night and has been this way for a while (as least to my knowledge). This
>>>>> is
>>>>>> exactly the type of situation that typically leads to TLP creation from
>>>>> what
>>>>>> I've seen.
>>>>>>
>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>>> aggravate those negatives, not help them.
>>>>>
>>>>> While the communities operate fairly separately at the moment, the
>>>>> people in the communities are not so separate. The committer list has
>>>>> huge overlap. Many committers on one project but not the other do a lot
>>>>> of work on both projects.
>>>>>
>>>>> There is already a strong link with the personal - merging the
>>>>> management of the projects addresses many of the concerns that have
>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>> them would increase.
>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>
>>>>>>>> Hey Grant,
>>>>>>>>
>>>>>>>> I'd like to explore this<   does this imply that the Lucene
>>>>> sub-projects will
>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>>> TLP,
>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>> mahout.apache.org
>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>>> worlds,
>>>>>>>> allowing project independence, but also not following the Apache
>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>>> less
>>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>>> anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>> Senior Computer Scientist
>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>
>>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
The possibility of slowing down releases is the only real concern I
also share....

But, I think release frequency is largely a matter of discipline :)

But, digging into it, I think as long as the project keeps a "stable
trunk" (something Lucene has always tried to do -- does Solr?)... then
release frequency is really a matter of discipline.

I mean in Lucene we keep saying we want faster releases, but why
doesn't it happen?  Couldn't we have done 2X as many releases in the
past few years?  Did we "really" want to release more frequently?

If we really want to take it seriously I think we should have someone
unofficially be the next release czar.  As soon as a release is
finished, this czar is responsible for roughly planning the next one.
This means making a tentative schedule, tracking big features and
making sure they land "early" enough to bake fully on trunk, etc.

New modules (eg spatial) need not gate the release -- that module's
docs would call out clearly that it's not fully baked yet...

Mike

On Mon, Mar 1, 2010 at 1:13 PM, Michael Busch <bu...@gmail.com> wrote:
> It seems like most of the people agree with these good goals but are
> concerned about the release cycles (including me). How can we achieve these
> goals without making releases more difficult?
>
>  Michael
>
> On 3/1/10 9:44 AM, Michael McCandless wrote:
>>
>> If we don't somehow first address the code duplication across the 2
>> projects, making Solr a TLP will make things worse.
>>
>> I started here with analysis because I think that's the biggest pain
>> point: it seemed like an obvious first step to fixing the code
>> duplication and thus the most likely to reach some consensus.  And
>> it's also very timely: Robert is right now making all kinds of great
>> fixes to our collective analyzers (in between bouts of fuzzy DFA
>> debugging).
>>
>> But it goes beyond analyzers: I'd like to see other modules, now in
>> Solr, eventually moved to Lucene, because they really are "core"
>> functionality (eg facets, function (and other?) queries, spatial,
>> maybe improvements to spellchecker/highlighter).  How can we do this?
>>
>> And how can we do this so that it "lasts" over time?  If new cool
>> "core" things are born in Solr-land (which of course happens alot --
>> lots of good healthy usage), how will they find their way back to
>> Lucene?
>>
>> Yonik's proposal (merging development of Solr/Lucene, but keeping all
>> else separate) would achieve this.
>>
>> If we do the opposite (Solr ->  TLP), how could we possibly achieve
>> this?
>>
>> I guess one possibility is to just suck it up and duplicate the code.
>> Meaning, each project will have to manually merge fixes in from the
>> other project (so long as there's someone around with the itch to do
>> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
>> likewise other dup'd functionality).  I really dislike this
>> solution... it will confuse the daylights out of users, its error
>> proned, it's a waste of dev effort, there will always be little
>> differences... but maybe it is in fact the lesser evil?
>>
>> I would much prefer merging Solr/Lucene development...
>>
>> Mike
>>
>> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
>> <ch...@jpl.nasa.gov>  wrote:
>>
>>>
>>> Hi Grant,
>>>
>>>
>>>>
>>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>>
>>>>
>>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole
>>>>> analyzers
>>>>> issue - I was in favor, at the very least, of having a separate
>>>>> module/project/whatever that both Solr/Lucene (and whatever project)
>>>>> can
>>>>> depend on for the shared analyzer code...
>>>>>
>>>>
>>>> Not really.  They are intimately linked.
>>>>
>>>
>>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>>> making Lucene(-java) and Solr depend on Apache Super Analyzers is
>>> separate
>>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 9:12 AM, "Robert Muir"<rc...@gmail.com>  wrote:
>>>>>
>>>>> this will make the analyzers duplication problem even worse
>>>>>
>>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J)<
>>>>> chris.a.mattmann@jpl.nasa.gov>  wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Mark,
>>>>>>
>>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>>> disagree. It just seems (to me at least based on the discussion) like
>>>>>> a TLP
>>>>>> for Solr is the way to go.
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/1/10 8:54 AM, "Mark Miller"<ma...@gmail.com>  wrote:
>>>>>>
>>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> That would really be no real world change from how things work
>>>>>>>> today.
>>>>>>>>
>>>>>>
>>>>>> The fact
>>>>>>
>>>>>>>>
>>>>>>>> is, today, Solr already operates essentially as an independent
>>>>>>>> project.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Well if that's the case, then it would lead me to think that it's
>>>>>>> more of
>>>>>>>
>>>>>>
>>>>>> a
>>>>>>
>>>>>>>
>>>>>>> TLP more than anything else per best practices.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That depends. It could be argued it should be a top level project or
>>>>>> that it should be closer to the Lucene project. Some people are
>>>>>> arguing
>>>>>> for both approaches right now. There are two directions we could move
>>>>>> in.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The only real difference is that it shares the same PMC with Lucene
>>>>>>>> now
>>>>>>>>
>>>>>>
>>>>>> and
>>>>>>
>>>>>>>>
>>>>>>>> wouldn't with this change. This would address none of the issues
>>>>>>>> that
>>>>>>>> triggered
>>>>>>>> the idea for a possible merge.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I don't agree -- you're looking to bring together two communities
>>>>>>> that
>>>>>>>
>>>>>>
>>>>>> are
>>>>>>
>>>>>>>
>>>>>>> "fairly separate" as you put it. The separation likely didn't spring
>>>>>>> up
>>>>>>>
>>>>>>
>>>>>> over
>>>>>>
>>>>>>>
>>>>>>> night and has been this way for a while (as least to my knowledge).
>>>>>>> This
>>>>>>>
>>>>>>
>>>>>> is
>>>>>>
>>>>>>>
>>>>>>> exactly the type of situation that typically leads to TLP creation
>>>>>>> from
>>>>>>>
>>>>>>
>>>>>> what
>>>>>>
>>>>>>>
>>>>>>> I've seen.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>>> address. Hence the birth of this proposal. Going TLP with Solr will
>>>>>> only
>>>>>> aggravate those negatives, not help them.
>>>>>>
>>>>>> While the communities operate fairly separately at the moment, the
>>>>>> people in the communities are not so separate. The committer list has
>>>>>> huge overlap. Many committers on one project but not the other do a
>>>>>> lot
>>>>>> of work on both projects.
>>>>>>
>>>>>> There is already a strong link with the personal - merging the
>>>>>> management of the projects addresses many of the concerns that have
>>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>>> them would increase.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hey Grant,
>>>>>>>>>
>>>>>>>>> I'd like to explore this<     does this imply that the Lucene
>>>>>>>>>
>>>>>>
>>>>>> sub-projects will
>>>>>>
>>>>>>>>>
>>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its
>>>>>>>>> Apache
>>>>>>>>>
>>>>>>
>>>>>> TLP,
>>>>>>
>>>>>>>>>
>>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>>>>>>
>>>>>>
>>>>>> mahout.apache.org
>>>>>>
>>>>>>>>>
>>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>>>>>>>
>>>>>>
>>>>>> worlds,
>>>>>>
>>>>>>>>>
>>>>>>>>> allowing project independence, but also not following the Apache
>>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to
>>>>>>>>>> consider
>>>>>>>>>>
>>>>>>
>>>>>> less
>>>>>>
>>>>>>>>>>
>>>>>>>>>> subprojects in the future, so we may be consolidating and spinning
>>>>>>>>>> off
>>>>>>>>>> anyway.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>>> Senior Computer Scientist
>>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> - Mark
>>>>>>>>
>>>>>>>> http://www.lucidimagination.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>> WWW:
>>>>>>> http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:
>>>>>> http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcmuir@gmail.com
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael Busch <bu...@gmail.com>.
It seems like most of the people agree with these good goals but are 
concerned about the release cycles (including me). How can we achieve 
these goals without making releases more difficult?

  Michael

On 3/1/10 9:44 AM, Michael McCandless wrote:
> If we don't somehow first address the code duplication across the 2
> projects, making Solr a TLP will make things worse.
>
> I started here with analysis because I think that's the biggest pain
> point: it seemed like an obvious first step to fixing the code
> duplication and thus the most likely to reach some consensus.  And
> it's also very timely: Robert is right now making all kinds of great
> fixes to our collective analyzers (in between bouts of fuzzy DFA
> debugging).
>
> But it goes beyond analyzers: I'd like to see other modules, now in
> Solr, eventually moved to Lucene, because they really are "core"
> functionality (eg facets, function (and other?) queries, spatial,
> maybe improvements to spellchecker/highlighter).  How can we do this?
>
> And how can we do this so that it "lasts" over time?  If new cool
> "core" things are born in Solr-land (which of course happens alot --
> lots of good healthy usage), how will they find their way back to
> Lucene?
>
> Yonik's proposal (merging development of Solr/Lucene, but keeping all
> else separate) would achieve this.
>
> If we do the opposite (Solr ->  TLP), how could we possibly achieve
> this?
>
> I guess one possibility is to just suck it up and duplicate the code.
> Meaning, each project will have to manually merge fixes in from the
> other project (so long as there's someone around with the itch to do
> so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
> likewise other dup'd functionality).  I really dislike this
> solution... it will confuse the daylights out of users, its error
> proned, it's a waste of dev effort, there will always be little
> differences... but maybe it is in fact the lesser evil?
>
> I would much prefer merging Solr/Lucene development...
>
> Mike
>
> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov>  wrote:
>    
>> Hi Grant,
>>
>>      
>>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>>
>>>        
>>>> Hi Robert,
>>>>
>>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>>> issue - I was in favor, at the very least, of having a separate
>>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>>> depend on for the shared analyzer code...
>>>>          
>>> Not really.  They are intimately linked.
>>>        
>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>
>> Cheers,
>> Chris
>>
>>
>>      
>>>
>>>        
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 9:12 AM, "Robert Muir"<rc...@gmail.com>  wrote:
>>>>
>>>> this will make the analyzers duplication problem even worse
>>>>
>>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J)<
>>>> chris.a.mattmann@jpl.nasa.gov>  wrote:
>>>>
>>>>          
>>>>> Hi Mark,
>>>>>
>>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>>> for Solr is the way to go.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>> On 3/1/10 8:54 AM, "Mark Miller"<ma...@gmail.com>  wrote:
>>>>>
>>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>>            
>>>>>> Hi Mark,
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> That would really be no real world change from how things work today.
>>>>>>>                
>>>>> The fact
>>>>>            
>>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>>
>>>>>>>                
>>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>>>>              
>>>>> a
>>>>>            
>>>>>> TLP more than anything else per best practices.
>>>>>>
>>>>>>              
>>>>> That depends. It could be argued it should be a top level project or
>>>>> that it should be closer to the Lucene project. Some people are arguing
>>>>> for both approaches right now. There are two directions we could move in.
>>>>>            
>>>>>>              
>>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>>>>>                
>>>>> and
>>>>>            
>>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>>> triggered
>>>>>>> the idea for a possible merge.
>>>>>>>
>>>>>>>                
>>>>>> I don't agree -- you're looking to bring together two communities that
>>>>>>              
>>>>> are
>>>>>            
>>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>>>>              
>>>>> over
>>>>>            
>>>>>> night and has been this way for a while (as least to my knowledge). This
>>>>>>              
>>>>> is
>>>>>            
>>>>>> exactly the type of situation that typically leads to TLP creation from
>>>>>>              
>>>>> what
>>>>>            
>>>>>> I've seen.
>>>>>>
>>>>>>              
>>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>>> aggravate those negatives, not help them.
>>>>>
>>>>> While the communities operate fairly separately at the moment, the
>>>>> people in the communities are not so separate. The committer list has
>>>>> huge overlap. Many committers on one project but not the other do a lot
>>>>> of work on both projects.
>>>>>
>>>>> There is already a strong link with the personal - merging the
>>>>> management of the projects addresses many of the concerns that have
>>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>>> multiply. They would diverge further, and incompatible overlap between
>>>>> them would increase.
>>>>>
>>>>>            
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>>
>>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>>
>>>>>>>                
>>>>>>>> Hey Grant,
>>>>>>>>
>>>>>>>> I'd like to explore this<     does this imply that the Lucene
>>>>>>>>                  
>>>>> sub-projects will
>>>>>            
>>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>>>>>>                  
>>>>> TLP,
>>>>>            
>>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>>>>>>                  
>>>>> mahout.apache.org
>>>>>            
>>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>>>>>>                  
>>>>> worlds,
>>>>>            
>>>>>>>> allowing project independence, but also not following the Apache
>>>>>>>> "antipattern" as Doug put it...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>     wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>>>>>>>                    
>>>>> less
>>>>>            
>>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>>> anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>> Senior Computer Scientist
>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>>> Phone: +1 (818) 354-8810
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> --
>>>>>>> - Mark
>>>>>>>
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>            
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>          
>>>
>>>        
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>      
>    


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Also, there still seems to be a misconception on what's being proposed
here.

The proposal is to synchronize the development of Solr and Lucene.
Ie, a single dev list, single set of committers, synchronized
releases.

Everything else remains the same.  EG the release artifacts, user's
lists, web sites, branding, all remain separate.

How the source code is modularized is an orthogonal question.  We've
discussed breaking out things of Lucene's core, like query parser,
queries, analyzers into their own modules (and shipping their own
artifacts), which I still think makes great sense.  But it's
independent of synchronizing our development.

Mike

On Mon, Mar 1, 2010 at 1:03 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote:
>>
>>> But it goes beyond analyzers: I'd like to see other modules, now in
>>> Solr, eventually moved to Lucene, because they really are "core"
>>> functionality (eg facets, function (and other?) queries, spatial,
>>> maybe improvements to spellchecker/highlighter).
>>
>> I disagree.  Those don't belong in core, and though they are all
>> great features, adding them to core constitutes "bloat", IMO.
>>
>> The Query class belongs in core.  All those other modules should be
>> distributed as plugins, which could be used by Solr, Katta, Lucene,
>> whatever.
>>
>> Note that this is orthogonal to whether Solr and Lucene merge or
>> diverge.
>
> I agree with this (sorry I wasn't clear).
>
> By "core functionality" I mean it should be a separate module (plugin)
> that direct Lucene users can use, not "whenever you install core
> Lucene you get these functions".
>
> Ie, users shouldn't have to install Solr to use facets with Lucene.
>
> Mike
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote:
>
>> But it goes beyond analyzers: I'd like to see other modules, now in
>> Solr, eventually moved to Lucene, because they really are "core"
>> functionality (eg facets, function (and other?) queries, spatial,
>> maybe improvements to spellchecker/highlighter).
>
> I disagree.  Those don't belong in core, and though they are all
> great features, adding them to core constitutes "bloat", IMO.
>
> The Query class belongs in core.  All those other modules should be
> distributed as plugins, which could be used by Solr, Katta, Lucene,
> whatever.
>
> Note that this is orthogonal to whether Solr and Lucene merge or
> diverge.

I agree with this (sorry I wasn't clear).

By "core functionality" I mean it should be a separate module (plugin)
that direct Lucene users can use, not "whenever you install core
Lucene you get these functions".

Ie, users shouldn't have to install Solr to use facets with Lucene.

Mike

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote:

> But it goes beyond analyzers: I'd like to see other modules, now in
> Solr, eventually moved to Lucene, because they really are "core"
> functionality (eg facets, function (and other?) queries, spatial,
> maybe improvements to spellchecker/highlighter).  

I disagree.  Those don't belong in core, and though they are all great
features, adding them to core constitutes "bloat", IMO.

The Query class belongs in core.  All those other modules should be
distributed as plugins, which could be used by Solr, Katta, Lucene, whatever.

Note that this is orthogonal to whether Solr and Lucene merge or diverge.

Marvin Humphrey


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Mark Miller <ma...@gmail.com>.
On 03/01/2010 01:43 PM, Chris Hostetter wrote:
> (Man, why is it you guys alwasy decide to start the monolithic
> "let's redesign the world" threads while i'm offline for a few days ...
> I figured at worst I'd 'svn up' and discover that McCandless had
> reimplemented all of the indexing code in Scala, but i certainly wasn't
> expecting all of this.)
>
> As some one who has attempted to read it all at once, let me just say that
> this thread is way too big.
>
> I say this not as a facetious comment about the number of messages or the
> depth of replies but as a serious comment about the breadth and depth of
> the core issues that people seem to be trying to address in a monolithic
> fashion -- monolithic suggestions which are in many ways diametricly
> opposed to each other.
>    
Personally, I don't think the idea of a merge is too big. I think the 
implications of it are less than you are making them out to be.
Monolithic suggestions? Lets half merge? Lets draft a resolution 
indicating that both Lucene and Solr devs would like to possibly play 
nicer together with more communication? I don't think that are a lot of 
baby steps towards this goal that will have any meaning or ramifications.

> Without obvious concensious on where we want to go, or a clear sense of
> how well things will work when we there "there" it seems most productive
> to focus on what would be needed to achieve some incremental steps that
> could be productive for any/all goals.
>    
That sounds like magic to me :) Or focusing on stuff that has nothing to 
do with a merge or TLP.
> At it's core: this thread started with McCandless'ss suggestion that
> refactoring some of text analysis code from Solr, Nutch and Lucene-Java
> out of all three projects and into a common code base would be beneficial
> to all three subprojects -- Not only do I see no flaw to that reasoning,
> but it also seems like it would (oddly enough) serve as a good first step
> towards *either* tighter development integration between Lucene-Java and
> Solr, *OR* towards looser development of the two code bases (via making
> Solr a seperate TLP).
>
> Developing a new code module like this should help demonstrate / excercise
> some of the "process" issues that might come up in trying to integrate the
> development and release processes of the existing products.  If things
> work out "well" that may illustrate that tighter integration is better; if
> things work out "poor" that should also tells us something, and may give
> us guidance on how to move forward.  In the worst case scenerio that i can
> imagine: some code is refactored out of Solr and Nutch in a way that makes
> it more directly usable by other comsumers of Lucene-Java.  (Even if Solr
> and Nutch never use that code and become their own TLPs and succed from
> the ASF to become caribbean tax haven that seems like a Net win for
> Lucene-Java)
>
> To put the issue another way: Does anyone see how McCandless'ss suggestion
> would be counter-productive towards your vision of what Lucene/Solr/Nutch
> should be like in the future? (regardless of your particular vision is)
>    
No, not necessarily - but I don't think its going to tell us anything 
useful about a merge. Its just going
to factor out some analyzers into what is likely going to be yet 
*another* project with more "do we run on trunk"
or "don't we" issues. Or it will be a Lucene contrib, and cause us even 
more headaches due to Solr not running on trunk.

> 			...
>
> : I started here with analysis because I think that's the biggest pain
> : point: it seemed like an obvious first step to fixing the code
> : duplication and thus the most likely to reach some consensus.  And
> : it's also very timely: Robert is right now making all kinds of great
> : fixes to our collective analyzers (in between bouts of fuzzy DFA
> : debugging).
>
>
>
> -Hoss
>    


-- 
- Mark

http://www.lucidimagination.com




Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Hoss,

I support Mike's original suggestion of having a shared, independently maintained/released analysis package for Nutch/Solr/Lucene. I emphatically do not support merging Solr and Lucene in the way proposed.

Hope that clarifies things, at least from me.

Cheers,
Chris



On 3/1/10 11:43 AM, "Chris Hostetter" <ho...@fucit.org> wrote:



(Man, why is it you guys alwasy decide to start the monolithic
"let's redesign the world" threads while i'm offline for a few days ...
I figured at worst I'd 'svn up' and discover that McCandless had
reimplemented all of the indexing code in Scala, but i certainly wasn't
expecting all of this.)

As some one who has attempted to read it all at once, let me just say that
this thread is way too big.

I say this not as a facetious comment about the number of messages or the
depth of replies but as a serious comment about the breadth and depth of
the core issues that people seem to be trying to address in a monolithic
fashion -- monolithic suggestions which are in many ways diametricly
opposed to each other.

Without obvious concensious on where we want to go, or a clear sense of
how well things will work when we there "there" it seems most productive
to focus on what would be needed to achieve some incremental steps that
could be productive for any/all goals.

At it's core: this thread started with McCandless'ss suggestion that
refactoring some of text analysis code from Solr, Nutch and Lucene-Java
out of all three projects and into a common code base would be beneficial
to all three subprojects -- Not only do I see no flaw to that reasoning,
but it also seems like it would (oddly enough) serve as a good first step
towards *either* tighter development integration between Lucene-Java and
Solr, *OR* towards looser development of the two code bases (via making
Solr a seperate TLP).

Developing a new code module like this should help demonstrate / excercise
some of the "process" issues that might come up in trying to integrate the
development and release processes of the existing products.  If things
work out "well" that may illustrate that tighter integration is better; if
things work out "poor" that should also tells us something, and may give
us guidance on how to move forward.  In the worst case scenerio that i can
imagine: some code is refactored out of Solr and Nutch in a way that makes
it more directly usable by other comsumers of Lucene-Java.  (Even if Solr
and Nutch never use that code and become their own TLPs and succed from
the ASF to become caribbean tax haven that seems like a Net win for
Lucene-Java)

To put the issue another way: Does anyone see how McCandless'ss suggestion
would be counter-productive towards your vision of what Lucene/Solr/Nutch
should be like in the future? (regardless of your particular vision is)

                        ...

: I started here with analysis because I think that's the biggest pain
: point: it seemed like an obvious first step to fixing the code
: duplication and thus the most likely to reach some consensus.  And
: it's also very timely: Robert is right now making all kinds of great
: fixes to our collective analyzers (in between bouts of fuzzy DFA
: debugging).



-Hoss



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Chris Hostetter <ho...@fucit.org>.
(Man, why is it you guys alwasy decide to start the monolithic 
"let's redesign the world" threads while i'm offline for a few days ... 
I figured at worst I'd 'svn up' and discover that McCandless had 
reimplemented all of the indexing code in Scala, but i certainly wasn't 
expecting all of this.)

As some one who has attempted to read it all at once, let me just say that 
this thread is way too big.  

I say this not as a facetious comment about the number of messages or the 
depth of replies but as a serious comment about the breadth and depth of 
the core issues that people seem to be trying to address in a monolithic 
fashion -- monolithic suggestions which are in many ways diametricly 
opposed to each other.

Without obvious concensious on where we want to go, or a clear sense of 
how well things will work when we there "there" it seems most productive 
to focus on what would be needed to achieve some incremental steps that 
could be productive for any/all goals.

At it's core: this thread started with McCandless'ss suggestion that 
refactoring some of text analysis code from Solr, Nutch and Lucene-Java 
out of all three projects and into a common code base would be beneficial 
to all three subprojects -- Not only do I see no flaw to that reasoning, 
but it also seems like it would (oddly enough) serve as a good first step 
towards *either* tighter development integration between Lucene-Java and 
Solr, *OR* towards looser development of the two code bases (via making 
Solr a seperate TLP).

Developing a new code module like this should help demonstrate / excercise 
some of the "process" issues that might come up in trying to integrate the 
development and release processes of the existing products.  If things 
work out "well" that may illustrate that tighter integration is better; if 
things work out "poor" that should also tells us something, and may give 
us guidance on how to move forward.  In the worst case scenerio that i can 
imagine: some code is refactored out of Solr and Nutch in a way that makes 
it more directly usable by other comsumers of Lucene-Java.  (Even if Solr 
and Nutch never use that code and become their own TLPs and succed from 
the ASF to become caribbean tax haven that seems like a Net win for 
Lucene-Java)

To put the issue another way: Does anyone see how McCandless'ss suggestion 
would be counter-productive towards your vision of what Lucene/Solr/Nutch 
should be like in the future? (regardless of your particular vision is)

			...

: I started here with analysis because I think that's the biggest pain
: point: it seemed like an obvious first step to fixing the code
: duplication and thus the most likely to reach some consensus.  And
: it's also very timely: Robert is right now making all kinds of great
: fixes to our collective analyzers (in between bouts of fuzzy DFA
: debugging).



-Hoss

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Mike,

I'm not sure I follow this line of thinking: how would Solr being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project.

Chris



On 3/1/10 10:44 AM, "Michael McCandless" <lu...@mikemccandless.com> wrote:

If we don't somehow first address the code duplication across the 2
projects, making Solr a TLP will make things worse.

I started here with analysis because I think that's the biggest pain
point: it seemed like an obvious first step to fixing the code
duplication and thus the most likely to reach some consensus.  And
it's also very timely: Robert is right now making all kinds of great
fixes to our collective analyzers (in between bouts of fuzzy DFA
debugging).

But it goes beyond analyzers: I'd like to see other modules, now in
Solr, eventually moved to Lucene, because they really are "core"
functionality (eg facets, function (and other?) queries, spatial,
maybe improvements to spellchecker/highlighter).  How can we do this?

And how can we do this so that it "lasts" over time?  If new cool
"core" things are born in Solr-land (which of course happens alot --
lots of good healthy usage), how will they find their way back to
Lucene?

Yonik's proposal (merging development of Solr/Lucene, but keeping all
else separate) would achieve this.

If we do the opposite (Solr -> TLP), how could we possibly achieve
this?

I guess one possibility is to just suck it up and duplicate the code.
Meaning, each project will have to manually merge fixes in from the
other project (so long as there's someone around with the itch to do
so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
likewise other dup'd functionality).  I really dislike this
solution... it will confuse the daylights out of users, its error
proned, it's a waste of dev effort, there will always be little
differences... but maybe it is in fact the lesser evil?

I would much prefer merging Solr/Lucene development...

Mike

On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hi Grant,
>
>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>
>>> Hi Robert,
>>>
>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>> issue - I was in favor, at the very least, of having a separate
>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>> depend on for the shared analyzer code...
>>
>> Not really.  They are intimately linked.
>
> Ummm, how so? Making project A called "Apache Super Analyzers" and then
> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
> of whether or not Lucene(-java) and Solr are TLPs or not...
>
> Cheers,
> Chris
>
>
>>
>>
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>
>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>>>
>>> this will make the analyzers duplication problem even worse
>>>
>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>> for Solr is the way to go.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>>>
>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>> Hi Mark,
>>>>>
>>>>>
>>>>>> That would really be no real world change from how things work today.
>>>> The fact
>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>
>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>> a
>>>>> TLP more than anything else per best practices.
>>>>>
>>>> That depends. It could be argued it should be a top level project or
>>>> that it should be closer to the Lucene project. Some people are arguing
>>>> for both approaches right now. There are two directions we could move in.
>>>>>
>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>> and
>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>> triggered
>>>>>> the idea for a possible merge.
>>>>>>
>>>>> I don't agree -- you're looking to bring together two communities that
>>>> are
>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>> over
>>>>> night and has been this way for a while (as least to my knowledge). This
>>>> is
>>>>> exactly the type of situation that typically leads to TLP creation from
>>>> what
>>>>> I've seen.
>>>>>
>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>> aggravate those negatives, not help them.
>>>>
>>>> While the communities operate fairly separately at the moment, the
>>>> people in the communities are not so separate. The committer list has
>>>> huge overlap. Many committers on one project but not the other do a lot
>>>> of work on both projects.
>>>>
>>>> There is already a strong link with the personal - merging the
>>>> management of the projects addresses many of the concerns that have
>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>> multiply. They would diverge further, and incompatible overlap between
>>>> them would increase.
>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>
>>>>>>> Hey Grant,
>>>>>>>
>>>>>>> I'd like to explore this<   does this imply that the Lucene
>>>> sub-projects will
>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>> TLP,
>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>> mahout.apache.org
>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>> worlds,
>>>>>>> allowing project independence, but also not following the Apache
>>>>>>> "antipattern" as Doug put it...
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>> less
>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>> anyway.
>>>>>>>>
>>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>> Phone: +1 (818) 354-8810
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>
>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Michael McCandless <lu...@mikemccandless.com>.
If we don't somehow first address the code duplication across the 2
projects, making Solr a TLP will make things worse.

I started here with analysis because I think that's the biggest pain
point: it seemed like an obvious first step to fixing the code
duplication and thus the most likely to reach some consensus.  And
it's also very timely: Robert is right now making all kinds of great
fixes to our collective analyzers (in between bouts of fuzzy DFA
debugging).

But it goes beyond analyzers: I'd like to see other modules, now in
Solr, eventually moved to Lucene, because they really are "core"
functionality (eg facets, function (and other?) queries, spatial,
maybe improvements to spellchecker/highlighter).  How can we do this?

And how can we do this so that it "lasts" over time?  If new cool
"core" things are born in Solr-land (which of course happens alot --
lots of good healthy usage), how will they find their way back to
Lucene?

Yonik's proposal (merging development of Solr/Lucene, but keeping all
else separate) would achieve this.

If we do the opposite (Solr -> TLP), how could we possibly achieve
this?

I guess one possibility is to just suck it up and duplicate the code.
Meaning, each project will have to manually merge fixes in from the
other project (so long as there's someone around with the itch to do
so).  Lucene would copy in all of Solr's analysis, and vice-versa (and
likewise other dup'd functionality).  I really dislike this
solution... it will confuse the daylights out of users, its error
proned, it's a waste of dev effort, there will always be little
differences... but maybe it is in fact the lesser evil?

I would much prefer merging Solr/Lucene development...

Mike

On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hi Grant,
>
>> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>>
>>> Hi Robert,
>>>
>>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>>> issue - I was in favor, at the very least, of having a separate
>>> module/project/whatever that both Solr/Lucene (and whatever project) can
>>> depend on for the shared analyzer code...
>>
>> Not really.  They are intimately linked.
>
> Ummm, how so? Making project A called "Apache Super Analyzers" and then
> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
> of whether or not Lucene(-java) and Solr are TLPs or not...
>
> Cheers,
> Chris
>
>
>>
>>
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>
>>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>>>
>>> this will make the analyzers duplication problem even worse
>>>
>>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>>> for Solr is the way to go.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>>>
>>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>>> Hi Mark,
>>>>>
>>>>>
>>>>>> That would really be no real world change from how things work today.
>>>> The fact
>>>>>> is, today, Solr already operates essentially as an independent project.
>>>>>>
>>>>> Well if that's the case, then it would lead me to think that it's more of
>>>> a
>>>>> TLP more than anything else per best practices.
>>>>>
>>>> That depends. It could be argued it should be a top level project or
>>>> that it should be closer to the Lucene project. Some people are arguing
>>>> for both approaches right now. There are two directions we could move in.
>>>>>
>>>>>> The only real difference is that it shares the same PMC with Lucene now
>>>> and
>>>>>> wouldn't with this change. This would address none of the issues that
>>>>>> triggered
>>>>>> the idea for a possible merge.
>>>>>>
>>>>> I don't agree -- you're looking to bring together two communities that
>>>> are
>>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>>> over
>>>>> night and has been this way for a while (as least to my knowledge). This
>>>> is
>>>>> exactly the type of situation that typically leads to TLP creation from
>>>> what
>>>>> I've seen.
>>>>>
>>>> It also causes negatives between Solr/Lucene that some are looking to
>>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>>> aggravate those negatives, not help them.
>>>>
>>>> While the communities operate fairly separately at the moment, the
>>>> people in the communities are not so separate. The committer list has
>>>> huge overlap. Many committers on one project but not the other do a lot
>>>> of work on both projects.
>>>>
>>>> There is already a strong link with the personal - merging the
>>>> management of the projects addresses many of the concerns that have
>>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>>> multiply. They would diverge further, and incompatible overlap between
>>>> them would increase.
>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>>>
>>>>>>> Hey Grant,
>>>>>>>
>>>>>>> I'd like to explore this<   does this imply that the Lucene
>>>> sub-projects will
>>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>>> TLP,
>>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>>> mahout.apache.org
>>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>>> worlds,
>>>>>>> allowing project independence, but also not following the Apache
>>>>>>> "antipattern" as Doug put it...
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>>> less
>>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>>> anyway.
>>>>>>>>
>>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>>> Phone: +1 (818) 354-8810
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>
>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Simon Willnauer <si...@googlemail.com>.
IMO the only downside is that we risk a longer release cycle if we
merge. I requires a certain level of discipline but has this been the
case since ever?! Anything else seems to be a win to both communities
and I personally would love to see the communities coming closer
again. I was working on many analyzers removing code duplication
maintaining BW compat almost every time we committed a change caused a
new issue on solr which could have been fixed in one go.
Concerns about Solr could slow us down during maintaining BW compat
appear to be invalid to me as the Solr API as a direct customer of the
lucene API would enforce our policy which is a good thing.

I also agree with Robert that moving Solr into a TLP would make things
even worse.


On Mon, Mar 1, 2010 at 6:02 PM, Robert Muir <rc...@gmail.com> wrote:
> but Yonik's proposal (or at least some of the ideas from it?) is attractive
> as it seems to solve the real problem that created the duplication in the
> first place, which is not limited to analyzers.
>
> On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hi Grant,
>>
>> > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
>> >
>> >> Hi Robert,
>> >>
>> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole
>> analyzers
>> >> issue - I was in favor, at the very least, of having a separate
>> >> module/project/whatever that both Solr/Lucene (and whatever project) can
>> >> depend on for the shared analyzer code...
>> >
>> > Not really.  They are intimately linked.
>>
>> Ummm, how so? Making project A called "Apache Super Analyzers" and then
>> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
>> of whether or not Lucene(-java) and Solr are TLPs or not...
>>
>> Cheers,
>> Chris
>>
>>
>> >
>> >
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >>
>> >>
>> >> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>> >>
>> >> this will make the analyzers duplication problem even worse
>> >>
>> >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>> >> chris.a.mattmann@jpl.nasa.gov> wrote:
>> >>
>> >>> Hi Mark,
>> >>>
>> >>> Thanks for your message. I respect your viewpoint, but I respectfully
>> >>> disagree. It just seems (to me at least based on the discussion) like a
>> TLP
>> >>> for Solr is the way to go.
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>>
>> >>>
>> >>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>> >>>
>> >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>> >>>> Hi Mark,
>> >>>>
>> >>>>
>> >>>>> That would really be no real world change from how things work today.
>> >>> The fact
>> >>>>> is, today, Solr already operates essentially as an independent
>> project.
>> >>>>>
>> >>>> Well if that's the case, then it would lead me to think that it's more
>> of
>> >>> a
>> >>>> TLP more than anything else per best practices.
>> >>>>
>> >>> That depends. It could be argued it should be a top level project or
>> >>> that it should be closer to the Lucene project. Some people are arguing
>> >>> for both approaches right now. There are two directions we could move
>> in.
>> >>>>
>> >>>>> The only real difference is that it shares the same PMC with Lucene
>> now
>> >>> and
>> >>>>> wouldn't with this change. This would address none of the issues that
>> >>>>> triggered
>> >>>>> the idea for a possible merge.
>> >>>>>
>> >>>> I don't agree -- you're looking to bring together two communities that
>> >>> are
>> >>>> "fairly separate" as you put it. The separation likely didn't spring
>> up
>> >>> over
>> >>>> night and has been this way for a while (as least to my knowledge).
>> This
>> >>> is
>> >>>> exactly the type of situation that typically leads to TLP creation
>> from
>> >>> what
>> >>>> I've seen.
>> >>>>
>> >>> It also causes negatives between Solr/Lucene that some are looking to
>> >>> address. Hence the birth of this proposal. Going TLP with Solr will
>> only
>> >>> aggravate those negatives, not help them.
>> >>>
>> >>> While the communities operate fairly separately at the moment, the
>> >>> people in the communities are not so separate. The committer list has
>> >>> huge overlap. Many committers on one project but not the other do a lot
>> >>> of work on both projects.
>> >>>
>> >>> There is already a strong link with the personal - merging the
>> >>> management of the projects addresses many of the concerns that have
>> >>> prompted this discussion. TLP'ing Solr only makes those concerns
>> >>> multiply. They would diverge further, and incompatible overlap between
>> >>> them would increase.
>> >>>
>> >>>> Cheers,
>> >>>> Chris
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>> >>>>>
>> >>>>>> Hey Grant,
>> >>>>>>
>> >>>>>> I'd like to explore this<   does this imply that the Lucene
>> >>> sub-projects will
>> >>>>>> go away and Lucene will turn into Lucene-java and maintain its
>> Apache
>> >>> TLP,
>> >>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>> >>> mahout.apache.org
>> >>>>>> (already started), etc. etc.? If so, that may be the best of all
>> >>> worlds,
>> >>>>>> allowing project independence, but also not following the Apache
>> >>>>>> "antipattern" as Doug put it...
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Chris
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>> >>> less
>> >>>>>>> subprojects in the future, so we may be consolidating and spinning
>> off
>> >>>>>>> anyway.
>> >>>>>>>
>> >>>>>>>
>> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>> Chris Mattmann, Ph.D.
>> >>>>>> Senior Computer Scientist
>> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>>>>> Office: 171-266B, Mailstop: 171-246
>> >>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>> >>>>>> Phone: +1 (818) 354-8810
>> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>> Adjunct Assistant Professor, Computer Science Department
>> >>>>>> University of Southern California, Los Angeles, CA 90089 USA
>> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>> - Mark
>> >>>>>
>> >>>>> http://www.lucidimagination.com
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>> Chris Mattmann, Ph.D.
>> >>>> Senior Computer Scientist
>> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>>> Office: 171-266B, Mailstop: 171-246
>> >>>> Email: Chris.Mattmann@jpl.nasa.gov
>> >>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>> <http://sunset.usc.edu/%7Emattmann/>
>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>> Adjunct Assistant Professor, Computer Science Department
>> >>>> University of Southern California, Los Angeles, CA 90089 USA
>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> - Mark
>> >>>
>> >>> http://www.lucidimagination.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Chris Mattmann, Ph.D.
>> >>> Senior Computer Scientist
>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>> Office: 171-266B, Mailstop: 171-246
>> >>> Email: Chris.Mattmann@jpl.nasa.gov
>> >>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>> <http://sunset.usc.edu/%7Emattmann/>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Adjunct Assistant Professor, Computer Science Department
>> >>> University of Southern California, Los Angeles, CA 90089 USA
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Robert Muir
>> >> rcmuir@gmail.com
>> >>
>> >>
>> >>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Chris Mattmann, Ph.D.
>> >> Senior Computer Scientist
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 171-266B, Mailstop: 171-246
>> >> Email: Chris.Mattmann@jpl.nasa.gov
>> >> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Adjunct Assistant Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >
>> >
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Robert Muir <rc...@gmail.com>.
but Yonik's proposal (or at least some of the ideas from it?) is attractive
as it seems to solve the real problem that created the duplication in the
first place, which is not limited to analyzers.

On Mon, Mar 1, 2010 at 12:01 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Grant,
>
> > On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
> >
> >> Hi Robert,
> >>
> >> I think my proposal (Solr->TLP) is sort of orthogonal to the whole
> analyzers
> >> issue - I was in favor, at the very least, of having a separate
> >> module/project/whatever that both Solr/Lucene (and whatever project) can
> >> depend on for the shared analyzer code...
> >
> > Not really.  They are intimately linked.
>
> Ummm, how so? Making project A called "Apache Super Analyzers" and then
> making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
> of whether or not Lucene(-java) and Solr are TLPs or not...
>
> Cheers,
> Chris
>
>
> >
> >
> >>
> >> Cheers,
> >> Chris
> >>
> >>
> >>
> >> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
> >>
> >> this will make the analyzers duplication problem even worse
> >>
> >> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>
> >>> Hi Mark,
> >>>
> >>> Thanks for your message. I respect your viewpoint, but I respectfully
> >>> disagree. It just seems (to me at least based on the discussion) like a
> TLP
> >>> for Solr is the way to go.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>>
> >>>
> >>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
> >>>
> >>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
> >>>> Hi Mark,
> >>>>
> >>>>
> >>>>> That would really be no real world change from how things work today.
> >>> The fact
> >>>>> is, today, Solr already operates essentially as an independent
> project.
> >>>>>
> >>>> Well if that's the case, then it would lead me to think that it's more
> of
> >>> a
> >>>> TLP more than anything else per best practices.
> >>>>
> >>> That depends. It could be argued it should be a top level project or
> >>> that it should be closer to the Lucene project. Some people are arguing
> >>> for both approaches right now. There are two directions we could move
> in.
> >>>>
> >>>>> The only real difference is that it shares the same PMC with Lucene
> now
> >>> and
> >>>>> wouldn't with this change. This would address none of the issues that
> >>>>> triggered
> >>>>> the idea for a possible merge.
> >>>>>
> >>>> I don't agree -- you're looking to bring together two communities that
> >>> are
> >>>> "fairly separate" as you put it. The separation likely didn't spring
> up
> >>> over
> >>>> night and has been this way for a while (as least to my knowledge).
> This
> >>> is
> >>>> exactly the type of situation that typically leads to TLP creation
> from
> >>> what
> >>>> I've seen.
> >>>>
> >>> It also causes negatives between Solr/Lucene that some are looking to
> >>> address. Hence the birth of this proposal. Going TLP with Solr will
> only
> >>> aggravate those negatives, not help them.
> >>>
> >>> While the communities operate fairly separately at the moment, the
> >>> people in the communities are not so separate. The committer list has
> >>> huge overlap. Many committers on one project but not the other do a lot
> >>> of work on both projects.
> >>>
> >>> There is already a strong link with the personal - merging the
> >>> management of the projects addresses many of the concerns that have
> >>> prompted this discussion. TLP'ing Solr only makes those concerns
> >>> multiply. They would diverge further, and incompatible overlap between
> >>> them would increase.
> >>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
> >>>>>
> >>>>>> Hey Grant,
> >>>>>>
> >>>>>> I'd like to explore this<   does this imply that the Lucene
> >>> sub-projects will
> >>>>>> go away and Lucene will turn into Lucene-java and maintain its
> Apache
> >>> TLP,
> >>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
> >>> mahout.apache.org
> >>>>>> (already started), etc. etc.? If so, that may be the best of all
> >>> worlds,
> >>>>>> allowing project independence, but also not following the Apache
> >>>>>> "antipattern" as Doug put it...
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Chris
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
> >>> less
> >>>>>>> subprojects in the future, so we may be consolidating and spinning
> off
> >>>>>>> anyway.
> >>>>>>>
> >>>>>>>
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> Chris Mattmann, Ph.D.
> >>>>>> Senior Computer Scientist
> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>>>> Phone: +1 (818) 354-8810
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> - Mark
> >>>>>
> >>>>> http://www.lucidimagination.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Chris Mattmann, Ph.D.
> >>>> Senior Computer Scientist
> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>> Office: 171-266B, Mailstop: 171-246
> >>>> Email: Chris.Mattmann@jpl.nasa.gov
> >>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> <http://sunset.usc.edu/%7Emattmann/>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Adjunct Assistant Professor, Computer Science Department
> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> - Mark
> >>>
> >>> http://www.lucidimagination.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: Chris.Mattmann@jpl.nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> <http://sunset.usc.edu/%7Emattmann/>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>
> >>
> >> --
> >> Robert Muir
> >> rcmuir@gmail.com
> >>
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: Chris.Mattmann@jpl.nasa.gov
> >> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Grant,

> On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:
> 
>> Hi Robert,
>> 
>> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers
>> issue - I was in favor, at the very least, of having a separate
>> module/project/whatever that both Solr/Lucene (and whatever project) can
>> depend on for the shared analyzer code...
> 
> Not really.  They are intimately linked.

Ummm, how so? Making project A called "Apache Super Analyzers" and then
making Lucene(-java) and Solr depend on Apache Super Analyzers is separate
of whether or not Lucene(-java) and Solr are TLPs or not...

Cheers,
Chris


> 
> 
>> 
>> Cheers,
>> Chris
>> 
>> 
>> 
>> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
>> 
>> this will make the analyzers duplication problem even worse
>> 
>> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>> 
>>> Hi Mark,
>>> 
>>> Thanks for your message. I respect your viewpoint, but I respectfully
>>> disagree. It just seems (to me at least based on the discussion) like a TLP
>>> for Solr is the way to go.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> 
>>> 
>>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>>> 
>>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>>> Hi Mark,
>>>> 
>>>> 
>>>>> That would really be no real world change from how things work today.
>>> The fact
>>>>> is, today, Solr already operates essentially as an independent project.
>>>>> 
>>>> Well if that's the case, then it would lead me to think that it's more of
>>> a
>>>> TLP more than anything else per best practices.
>>>> 
>>> That depends. It could be argued it should be a top level project or
>>> that it should be closer to the Lucene project. Some people are arguing
>>> for both approaches right now. There are two directions we could move in.
>>>> 
>>>>> The only real difference is that it shares the same PMC with Lucene now
>>> and
>>>>> wouldn't with this change. This would address none of the issues that
>>>>> triggered
>>>>> the idea for a possible merge.
>>>>> 
>>>> I don't agree -- you're looking to bring together two communities that
>>> are
>>>> "fairly separate" as you put it. The separation likely didn't spring up
>>> over
>>>> night and has been this way for a while (as least to my knowledge). This
>>> is
>>>> exactly the type of situation that typically leads to TLP creation from
>>> what
>>>> I've seen.
>>>> 
>>> It also causes negatives between Solr/Lucene that some are looking to
>>> address. Hence the birth of this proposal. Going TLP with Solr will only
>>> aggravate those negatives, not help them.
>>> 
>>> While the communities operate fairly separately at the moment, the
>>> people in the communities are not so separate. The committer list has
>>> huge overlap. Many committers on one project but not the other do a lot
>>> of work on both projects.
>>> 
>>> There is already a strong link with the personal - merging the
>>> management of the projects addresses many of the concerns that have
>>> prompted this discussion. TLP'ing Solr only makes those concerns
>>> multiply. They would diverge further, and incompatible overlap between
>>> them would increase.
>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>>> 
>>>>>> Hey Grant,
>>>>>> 
>>>>>> I'd like to explore this<   does this imply that the Lucene
>>> sub-projects will
>>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>>> TLP,
>>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>>> mahout.apache.org
>>>>>> (already started), etc. etc.? If so, that may be the best of all
>>> worlds,
>>>>>> allowing project independence, but also not following the Apache
>>>>>> "antipattern" as Doug put it...
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>>> less
>>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>>> anyway.
>>>>>>> 
>>>>>>> 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>>> Phone: +1 (818) 354-8810
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> - Mark
>>>>> 
>>>>> http://www.lucidimagination.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> - Mark
>>> 
>>> http://www.lucidimagination.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>> 
>> 
>> --
>> Robert Muir
>> rcmuir@gmail.com
>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by Grant Ingersoll <gs...@apache.org>.
On Mar 1, 2010, at 8:20 AM, Mattmann, Chris A (388J) wrote:

> Hi Robert,
> 
> I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers issue - I was in favor, at the very least, of having a separate module/project/whatever that both Solr/Lucene (and whatever project) can depend on for the shared analyzer code...

Not really.  They are intimately linked.


> 
> Cheers,
> Chris
> 
> 
> 
> On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:
> 
> this will make the analyzers duplication problem even worse
> 
> On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> 
>> Hi Mark,
>> 
>> Thanks for your message. I respect your viewpoint, but I respectfully
>> disagree. It just seems (to me at least based on the discussion) like a TLP
>> for Solr is the way to go.
>> 
>> Cheers,
>> Chris
>> 
>> 
>> 
>> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>> 
>> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
>>> Hi Mark,
>>> 
>>> 
>>>> That would really be no real world change from how things work today.
>> The fact
>>>> is, today, Solr already operates essentially as an independent project.
>>>> 
>>> Well if that's the case, then it would lead me to think that it's more of
>> a
>>> TLP more than anything else per best practices.
>>> 
>> That depends. It could be argued it should be a top level project or
>> that it should be closer to the Lucene project. Some people are arguing
>> for both approaches right now. There are two directions we could move in.
>>> 
>>>> The only real difference is that it shares the same PMC with Lucene now
>> and
>>>> wouldn't with this change. This would address none of the issues that
>>>> triggered
>>>> the idea for a possible merge.
>>>> 
>>> I don't agree -- you're looking to bring together two communities that
>> are
>>> "fairly separate" as you put it. The separation likely didn't spring up
>> over
>>> night and has been this way for a while (as least to my knowledge). This
>> is
>>> exactly the type of situation that typically leads to TLP creation from
>> what
>>> I've seen.
>>> 
>> It also causes negatives between Solr/Lucene that some are looking to
>> address. Hence the birth of this proposal. Going TLP with Solr will only
>> aggravate those negatives, not help them.
>> 
>> While the communities operate fairly separately at the moment, the
>> people in the communities are not so separate. The committer list has
>> huge overlap. Many committers on one project but not the other do a lot
>> of work on both projects.
>> 
>> There is already a strong link with the personal - merging the
>> management of the projects addresses many of the concerns that have
>> prompted this discussion. TLP'ing Solr only makes those concerns
>> multiply. They would diverge further, and incompatible overlap between
>> them would increase.
>> 
>>> Cheers,
>>> Chris
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
>>>> 
>>>>> Hey Grant,
>>>>> 
>>>>> I'd like to explore this<   does this imply that the Lucene
>> sub-projects will
>>>>> go away and Lucene will turn into Lucene-java and maintain its Apache
>> TLP,
>>>>> and then you'd have say, solr.apache.org, tika.apache.org,
>> mahout.apache.org
>>>>> (already started), etc. etc.? If so, that may be the best of all
>> worlds,
>>>>> allowing project independence, but also not following the Apache
>>>>> "antipattern" as Doug put it...
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> 
>>>>> 
>>>>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> Also, as Doug alluded to, the Board is likely to ask us to consider
>> less
>>>>>> subprojects in the future, so we may be consolidating and spinning off
>>>>>> anyway.
>>>>>> 
>>>>>> 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: Chris.Mattmann@jpl.nasa.gov
>>>>> Phone: +1 (818) 354-8810
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> - Mark
>>>> 
>>>> http://www.lucidimagination.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: Chris.Mattmann@jpl.nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> - Mark
>> 
>> http://www.lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
> 
> 
> --
> Robert Muir
> rcmuir@gmail.com
> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 


Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Robert,

I think my proposal (Solr->TLP) is sort of orthogonal to the whole analyzers issue - I was in favor, at the very least, of having a separate module/project/whatever that both Solr/Lucene (and whatever project) can depend on for the shared analyzer code...

Cheers,
Chris



On 3/1/10 9:12 AM, "Robert Muir" <rc...@gmail.com> wrote:

this will make the analyzers duplication problem even worse

On Mon, Mar 1, 2010 at 11:06 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Mark,
>
> Thanks for your message. I respect your viewpoint, but I respectfully
> disagree. It just seems (to me at least based on the discussion) like a TLP
> for Solr is the way to go.
>
> Cheers,
> Chris
>
>
>
> On 3/1/10 8:54 AM, "Mark Miller" <ma...@gmail.com> wrote:
>
> On 03/01/2010 10:40 AM, Mattmann, Chris A (388J) wrote:
> > Hi Mark,
> >
> >
> >> That would really be no real world change from how things work today.
> The fact
> >> is, today, Solr already operates essentially as an independent project.
> >>
> > Well if that's the case, then it would lead me to think that it's more of
> a
> > TLP more than anything else per best practices.
> >
> That depends. It could be argued it should be a top level project or
> that it should be closer to the Lucene project. Some people are arguing
> for both approaches right now. There are two directions we could move in.
> >
> >> The only real difference is that it shares the same PMC with Lucene now
> and
> >> wouldn't with this change. This would address none of the issues that
> >> triggered
> >> the idea for a possible merge.
> >>
> > I don't agree -- you're looking to bring together two communities that
> are
> > "fairly separate" as you put it. The separation likely didn't spring up
> over
> > night and has been this way for a while (as least to my knowledge). This
> is
> > exactly the type of situation that typically leads to TLP creation from
> what
> > I've seen.
> >
> It also causes negatives between Solr/Lucene that some are looking to
> address. Hence the birth of this proposal. Going TLP with Solr will only
> aggravate those negatives, not help them.
>
> While the communities operate fairly separately at the moment, the
> people in the communities are not so separate. The committer list has
> huge overlap. Many committers on one project but not the other do a lot
> of work on both projects.
>
> There is already a strong link with the personal - merging the
> management of the projects addresses many of the concerns that have
> prompted this discussion. TLP'ing Solr only makes those concerns
> multiply. They would diverge further, and incompatible overlap between
> them would increase.
>
> > Cheers,
> > Chris
> >
> >
> >
> >
> >>
> >>
> >> On 03/01/2010 10:04 AM, Mattmann, Chris A (388J) wrote:
> >>
> >>> Hey Grant,
> >>>
> >>> I'd like to explore this<   does this imply that the Lucene
> sub-projects will
> >>> go away and Lucene will turn into Lucene-java and maintain its Apache
> TLP,
> >>> and then you'd have say, solr.apache.org, tika.apache.org,
> mahout.apache.org
> >>> (already started), etc. etc.? If so, that may be the best of all
> worlds,
> >>> allowing project independence, but also not following the Apache
> >>> "antipattern" as Doug put it...
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>>
> >>>
> >>> On 3/1/10 7:28 AM, "Grant Ingersoll"<gs...@apache.org>   wrote:
> >>>
> >>>
> >>>
> >>>> Also, as Doug alluded to, the Board is likely to ask us to consider
> less
> >>>> subprojects in the future, so we may be consolidating and spinning off
> >>>> anyway.
> >>>>
> >>>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: Chris.Mattmann@jpl.nasa.gov
> >>> Phone: +1 (818) 354-8810
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: Chris.Mattmann@jpl.nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


--
Robert Muir
rcmuir@gmail.com



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++