You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/02/06 16:50:23 UTC

more sandbox questions

XML-Indexing-Demo - I propose this be moved to an "examples" area if we 
keep it at all.

parsers - Is anyone using the PDF parser here?

taglib - my bad in committing this in the first place - its not well 
implemented and of marginal use.  I propose to remove it entirely.

miscellaneous - I propose that when moved to contrib/util.

similarity & spellchecker - I propose this be combined with the 
contrib/util.

Thoughts on these?

The contrib area should be useful add-ons to Lucene's core, and isn't 
really appropriate for examples/demos, it seems to me.

The tricky pieces are miscellaneous, similarity, and spellchecker.  
These are tiny by themselves and putting them in a util area and 
packaging them altogether seems ok to me at one level, but does it make 
more sense to keep these completely separate?

On a related note, should we combine snowball in with analyzers?  Or 
leave it on its own still?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: more sandbox questions

Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:

> On Feb 7, 2005, at 1:21 AM, David Spencer wrote:
> 
>> Erik Hatcher wrote:
>>
>>> XML-Indexing-Demo - I propose this be moved to an "examples" area if 
>>> we keep it at all.
>>> parsers - Is anyone using the PDF parser here?
>>> taglib - my bad in committing this in the first place - its not well 
>>> implemented and of marginal use.  I propose to remove it entirely.
>>> miscellaneous - I propose that when moved to contrib/util.
>>> similarity & spellchecker - I propose this be combined with the 
>>> contrib/util.
>>> Thoughts on these?
>>
>>
>> Another way of looking at it is to group query expansion code together 
>> i.e. similarity + spellchecker + wordnet go together. I think calling 
>> things "util" or "misc" demeans them - but disclaimer, these 3 things 
>> are coincidentally all mine.
> 
> 
> No offense or demeaning intended.

None taken! Sorry, I should have made that clear.
I agree w/ trying to make sense of the packaging as that gives Lucene 
more value.


>  I wasn't that happy with an umbrella 
> "util" area myself, but also am trying to ensure we have a clean and 
> sensible contrib area.  Keep in mind that the idea is package each 
> contrib project as its own separate package within the Lucene 
> distribution.  So highlighter, with the Lucene 2.0 release, would be 
> packaged as highlighter-2.0.jar.  The WordNet package is unique in that 
> it is not something  you add-on to an application using Lucene, but 
> rather a tool that is used to generate an index for use with your 

This may not be quite precise - the WordNet pkg does 2 things, [1] 
builds a synonym index and [2] expands queries. [2] is done in 
SynExpand.java.

Thus I thought it would make sense to think of a "query expansion" 
module and group this + the similarity stuff...

> application.  I'm not sure how these distinctions factor into how we 
> package things.
> 
>>> The contrib area should be useful add-ons to Lucene's core, and isn't 
>>> really appropriate for examples/demos, it seems to me.
>>> The tricky pieces are miscellaneous, similarity, and spellchecker.  
>>> These are tiny by themselves and putting them in a util area and 
>>> packaging them altogether seems ok to me at one level, but does it 
>>> make more sense to keep these completely separate?
>>
>>
>> OK, to be more concrete, I'll suggest the 3 above go to "search" or 
>> "query-expansion".
> 
> 
> "search" is too generic, it seems, since all of Lucene could fit under 
> that categorization.  Maybe it makes the most sense to leave them as-is 
> for the time being - though keeping it open for discussion is good to 
> see what others think.
> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: more sandbox questions

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 7, 2005, at 1:21 AM, David Spencer wrote:
> Erik Hatcher wrote:
>
>> XML-Indexing-Demo - I propose this be moved to an "examples" area if 
>> we keep it at all.
>> parsers - Is anyone using the PDF parser here?
>> taglib - my bad in committing this in the first place - its not well 
>> implemented and of marginal use.  I propose to remove it entirely.
>> miscellaneous - I propose that when moved to contrib/util.
>> similarity & spellchecker - I propose this be combined with the 
>> contrib/util.
>> Thoughts on these?
>
> Another way of looking at it is to group query expansion code together 
> i.e. similarity + spellchecker + wordnet go together. I think calling 
> things "util" or "misc" demeans them - but disclaimer, these 3 things 
> are coincidentally all mine.

No offense or demeaning intended.  I wasn't that happy with an umbrella 
"util" area myself, but also am trying to ensure we have a clean and 
sensible contrib area.  Keep in mind that the idea is package each 
contrib project as its own separate package within the Lucene 
distribution.  So highlighter, with the Lucene 2.0 release, would be 
packaged as highlighter-2.0.jar.  The WordNet package is unique in that 
it is not something  you add-on to an application using Lucene, but 
rather a tool that is used to generate an index for use with your 
application.  I'm not sure how these distinctions factor into how we 
package things.

>> The contrib area should be useful add-ons to Lucene's core, and isn't 
>> really appropriate for examples/demos, it seems to me.
>> The tricky pieces are miscellaneous, similarity, and spellchecker.  
>> These are tiny by themselves and putting them in a util area and 
>> packaging them altogether seems ok to me at one level, but does it 
>> make more sense to keep these completely separate?
>
> OK, to be more concrete, I'll suggest the 3 above go to "search" or 
> "query-expansion".

"search" is too generic, it seems, since all of Lucene could fit under 
that categorization.  Maybe it makes the most sense to leave them as-is 
for the time being - though keeping it open for discussion is good to 
see what others think.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: more sandbox questions

Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:

> XML-Indexing-Demo - I propose this be moved to an "examples" area if we 
> keep it at all.
> 
> parsers - Is anyone using the PDF parser here?
> 
> taglib - my bad in committing this in the first place - its not well 
> implemented and of marginal use.  I propose to remove it entirely.
> 
> miscellaneous - I propose that when moved to contrib/util.
> 
> similarity & spellchecker - I propose this be combined with the 
> contrib/util.
> 
> Thoughts on these?

Another way of looking at it is to group query expansion code together 
i.e. similarity + spellchecker + wordnet go together. I think calling 
things "util" or "misc" demeans them - but disclaimer, these 3 things 
are coincidentally all mine.


> 
> The contrib area should be useful add-ons to Lucene's core, and isn't 
> really appropriate for examples/demos, it seems to me.
> 
> The tricky pieces are miscellaneous, similarity, and spellchecker.  
> These are tiny by themselves and putting them in a util area and 
> packaging them altogether seems ok to me at one level, but does it make 
> more sense to keep these completely separate?

OK, to be more concrete, I'll suggest the 3 above go to "search" or 
"query-expansion".

> 
> On a related note, should we combine snowball in with analyzers?  Or 
> leave it on its own still?
> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org