You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/02/06 16:50:23 UTC
more sandbox questions
XML-Indexing-Demo - I propose this be moved to an "examples" area if we
keep it at all.
parsers - Is anyone using the PDF parser here?
taglib - my bad in committing this in the first place - its not well
implemented and of marginal use. I propose to remove it entirely.
miscellaneous - I propose that when moved to contrib/util.
similarity & spellchecker - I propose this be combined with the
contrib/util.
Thoughts on these?
The contrib area should be useful add-ons to Lucene's core, and isn't
really appropriate for examples/demos, it seems to me.
The tricky pieces are miscellaneous, similarity, and spellchecker.
These are tiny by themselves and putting them in a util area and
packaging them altogether seems ok to me at one level, but does it make
more sense to keep these completely separate?
On a related note, should we combine snowball in with analyzers? Or
leave it on its own still?
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: more sandbox questions
Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:
> On Feb 7, 2005, at 1:21 AM, David Spencer wrote:
>
>> Erik Hatcher wrote:
>>
>>> XML-Indexing-Demo - I propose this be moved to an "examples" area if
>>> we keep it at all.
>>> parsers - Is anyone using the PDF parser here?
>>> taglib - my bad in committing this in the first place - its not well
>>> implemented and of marginal use. I propose to remove it entirely.
>>> miscellaneous - I propose that when moved to contrib/util.
>>> similarity & spellchecker - I propose this be combined with the
>>> contrib/util.
>>> Thoughts on these?
>>
>>
>> Another way of looking at it is to group query expansion code together
>> i.e. similarity + spellchecker + wordnet go together. I think calling
>> things "util" or "misc" demeans them - but disclaimer, these 3 things
>> are coincidentally all mine.
>
>
> No offense or demeaning intended.
None taken! Sorry, I should have made that clear.
I agree w/ trying to make sense of the packaging as that gives Lucene
more value.
> I wasn't that happy with an umbrella
> "util" area myself, but also am trying to ensure we have a clean and
> sensible contrib area. Keep in mind that the idea is package each
> contrib project as its own separate package within the Lucene
> distribution. So highlighter, with the Lucene 2.0 release, would be
> packaged as highlighter-2.0.jar. The WordNet package is unique in that
> it is not something you add-on to an application using Lucene, but
> rather a tool that is used to generate an index for use with your
This may not be quite precise - the WordNet pkg does 2 things, [1]
builds a synonym index and [2] expands queries. [2] is done in
SynExpand.java.
Thus I thought it would make sense to think of a "query expansion"
module and group this + the similarity stuff...
> application. I'm not sure how these distinctions factor into how we
> package things.
>
>>> The contrib area should be useful add-ons to Lucene's core, and isn't
>>> really appropriate for examples/demos, it seems to me.
>>> The tricky pieces are miscellaneous, similarity, and spellchecker.
>>> These are tiny by themselves and putting them in a util area and
>>> packaging them altogether seems ok to me at one level, but does it
>>> make more sense to keep these completely separate?
>>
>>
>> OK, to be more concrete, I'll suggest the 3 above go to "search" or
>> "query-expansion".
>
>
> "search" is too generic, it seems, since all of Lucene could fit under
> that categorization. Maybe it makes the most sense to leave them as-is
> for the time being - though keeping it open for discussion is good to
> see what others think.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: more sandbox questions
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 7, 2005, at 1:21 AM, David Spencer wrote:
> Erik Hatcher wrote:
>
>> XML-Indexing-Demo - I propose this be moved to an "examples" area if
>> we keep it at all.
>> parsers - Is anyone using the PDF parser here?
>> taglib - my bad in committing this in the first place - its not well
>> implemented and of marginal use. I propose to remove it entirely.
>> miscellaneous - I propose that when moved to contrib/util.
>> similarity & spellchecker - I propose this be combined with the
>> contrib/util.
>> Thoughts on these?
>
> Another way of looking at it is to group query expansion code together
> i.e. similarity + spellchecker + wordnet go together. I think calling
> things "util" or "misc" demeans them - but disclaimer, these 3 things
> are coincidentally all mine.
No offense or demeaning intended. I wasn't that happy with an umbrella
"util" area myself, but also am trying to ensure we have a clean and
sensible contrib area. Keep in mind that the idea is package each
contrib project as its own separate package within the Lucene
distribution. So highlighter, with the Lucene 2.0 release, would be
packaged as highlighter-2.0.jar. The WordNet package is unique in that
it is not something you add-on to an application using Lucene, but
rather a tool that is used to generate an index for use with your
application. I'm not sure how these distinctions factor into how we
package things.
>> The contrib area should be useful add-ons to Lucene's core, and isn't
>> really appropriate for examples/demos, it seems to me.
>> The tricky pieces are miscellaneous, similarity, and spellchecker.
>> These are tiny by themselves and putting them in a util area and
>> packaging them altogether seems ok to me at one level, but does it
>> make more sense to keep these completely separate?
>
> OK, to be more concrete, I'll suggest the 3 above go to "search" or
> "query-expansion".
"search" is too generic, it seems, since all of Lucene could fit under
that categorization. Maybe it makes the most sense to leave them as-is
for the time being - though keeping it open for discussion is good to
see what others think.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: more sandbox questions
Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:
> XML-Indexing-Demo - I propose this be moved to an "examples" area if we
> keep it at all.
>
> parsers - Is anyone using the PDF parser here?
>
> taglib - my bad in committing this in the first place - its not well
> implemented and of marginal use. I propose to remove it entirely.
>
> miscellaneous - I propose that when moved to contrib/util.
>
> similarity & spellchecker - I propose this be combined with the
> contrib/util.
>
> Thoughts on these?
Another way of looking at it is to group query expansion code together
i.e. similarity + spellchecker + wordnet go together. I think calling
things "util" or "misc" demeans them - but disclaimer, these 3 things
are coincidentally all mine.
>
> The contrib area should be useful add-ons to Lucene's core, and isn't
> really appropriate for examples/demos, it seems to me.
>
> The tricky pieces are miscellaneous, similarity, and spellchecker.
> These are tiny by themselves and putting them in a util area and
> packaging them altogether seems ok to me at one level, but does it make
> more sense to keep these completely separate?
OK, to be more concrete, I'll suggest the 3 above go to "search" or
"query-expansion".
>
> On a related note, should we combine snowball in with analyzers? Or
> leave it on its own still?
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org