You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/05/13 20:05:47 UTC
DO NOT REPLY [Bug 28960] New: -
Add "an" to the English stop words
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28960>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=28960
Add "an" to the English stop words
Summary: Add "an" to the English stop words
Product: Lucene
Version: unspecified
Platform: PC
OS/Version: Windows NT/2K
Status: NEW
Severity: Minor
Priority: Other
Component: Analysis
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: ats37@hotmail.com
In org.apache.lucene.analysis.StopAnalyzer, the ENGLISH_STOP_WORDS array
contains "a" but not "an". So searching for "a fund" will get the same hits as
"fund", but searching for "an investment" will get many more hits than "investment".
This is true in the latest revision of the file, but appears to have always been
the case. I'm amazed nobody's pointed it out before now, our users had only
been testing for a few hours before they complained about it :-)
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: DO NOT REPLY [Bug 28960] New: - Add "an" to the English stop words
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Yeah, I think that would cause problems for some people.
I'm for closing that bug and maybe even just removing all the stop
words from the Lucene core, so people don't rely on them, as they are
really more for a demo and should not be done.
Otis
--- Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I don't mind adding "an" to the list, but should we be concerned
> about
> any backwards compatibility issues with this change?
>
> Erik
>
>
> On May 13, 2004, at 2:05 PM, bugzilla@apache.org wrote:
>
> > DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
> > RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> > <http://issues.apache.org/bugzilla/show_bug.cgi?id=28960>.
> > ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
> > INSERTED IN THE BUG DATABASE.
> >
> > http://issues.apache.org/bugzilla/show_bug.cgi?id=28960
> >
> > Add "an" to the English stop words
> >
> > Summary: Add "an" to the English stop words
> > Product: Lucene
> > Version: unspecified
> > Platform: PC
> > OS/Version: Windows NT/2K
> > Status: NEW
> > Severity: Minor
> > Priority: Other
> > Component: Analysis
> > AssignedTo: lucene-dev@jakarta.apache.org
> > ReportedBy: ats37@hotmail.com
> >
> >
> > In org.apache.lucene.analysis.StopAnalyzer, the ENGLISH_STOP_WORDS
> > array
> > contains "a" but not "an". So searching for "a fund" will get the
> > same hits as
> > "fund", but searching for "an investment" will get many more hits
> than
> > "investment".
> >
> > This is true in the latest revision of the file, but appears to
> have
> > always been
> > the case. I'm amazed nobody's pointed it out before now, our users
>
> > had only
> > been testing for a few hours before they complained about it :-)
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: DO NOT REPLY [Bug 28960] New: - Add "an" to the English stop words
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I don't mind adding "an" to the list, but should we be concerned about
any backwards compatibility issues with this change?
Erik
On May 13, 2004, at 2:05 PM, bugzilla@apache.org wrote:
> DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
> RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> <http://issues.apache.org/bugzilla/show_bug.cgi?id=28960>.
> ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
> INSERTED IN THE BUG DATABASE.
>
> http://issues.apache.org/bugzilla/show_bug.cgi?id=28960
>
> Add "an" to the English stop words
>
> Summary: Add "an" to the English stop words
> Product: Lucene
> Version: unspecified
> Platform: PC
> OS/Version: Windows NT/2K
> Status: NEW
> Severity: Minor
> Priority: Other
> Component: Analysis
> AssignedTo: lucene-dev@jakarta.apache.org
> ReportedBy: ats37@hotmail.com
>
>
> In org.apache.lucene.analysis.StopAnalyzer, the ENGLISH_STOP_WORDS
> array
> contains "a" but not "an". So searching for "a fund" will get the
> same hits as
> "fund", but searching for "an investment" will get many more hits than
> "investment".
>
> This is true in the latest revision of the file, but appears to have
> always been
> the case. I'm amazed nobody's pointed it out before now, our users
> had only
> been testing for a few hours before they complained about it :-)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org