You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Greg Huber <gr...@gmail.com> on 2017/01/27 15:42:20 UTC

Strange results returned from suggester

Hello,

Is there anyway to see why items are returned from the suggester?  Similar
to the search.

I have a really strange case where if I enter 'will' (without the quotes)
it seems to return all the search results.

example:

there should be two entries beginning with will*  ie william and Willoughby

wil >  two entries with correct highlight
will > all entries with NO highlight
willi > single entry
willo > single entry

I have checked and I do not have will on all the entries!

Cheers Greg

Re: Strange results returned from suggester

Posted by Greg Huber <gr...@gmail.com>.
Uwe,

Perfect, exactly what I was looking for.  No duplication and no on going
maintenance (as using defaults) :-)

return CustomAnalyzer.builder()
.withTokenizer(StandardTokenizerFactory.class)
.addTokenFilter(StandardFilterFactory.class)
.addTokenFilter(LowerCaseFilterFactory.class)
.addTokenFilter(SuggestStopFilterFactory.class).build();

Thanks Greg.

On 29 January 2017 at 12:17, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> CustomAnalyzer is a very generic thing. It has a builder that you can use
> to configure your analyzer. You can define which Tokenizer, which
> StopFilter (and pass stop words as you like), add stemming. No, it does not
> subclass StopWordAnalyzerBase, but that is also not needed, because it has
> a generic configuration interface.
>
> So I don't understand you problem. Lucene APIs take the abstract Analyzer
> class and CustomAnalyzer provides it the same like StandardAnalyzer.
> CustomAnalyzer is basically the same like Solr's schema.xml and
> Elasticsearch's analyzer index config.
>
> The first example in the Javadocs is more or less StandardAnalyzer, just
> adapt it and pass the factory:
> http://lucene.apache.org/core/6_4_0/analyzers-common/org/
> apache/lucene/analysis/custom/CustomAnalyzer.html
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Greg Huber [mailto:gregh3269@gmail.com]
> > Sent: Sunday, January 29, 2017 12:48 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Strange results returned from suggester
> >
> > Uwe,
> >
> > >...or use CustomAnalyzer then you don't need to
> > > subclass. Just decare the components.
> >
> > If I need the StandardAnalyzer code (marked final) and this extends
> > StopwordAnalyzerBase, how would I do this?
> >
> > Cheers Greg
> >
> > On 29 January 2017 at 11:32, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> > > ...or use CustomAnalyzer then you don't need to subclass. Just decare
> the
> > > components.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > > -----Original Message-----
> > > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > > Sent: Sunday, January 29, 2017 12:28 PM
> > > > To: Greg Huber <gr...@gmail.com>; Lucene Users <java-
> > > > user@lucene.apache.org>
> > > > Subject: Re: Strange results returned from suggester
> > > >
> > > > That's right, just make your own analyzer, forked from
> > > > StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> > > > tiny class and this (creating your own components in an analyzers) is
> > > > normal practice...
> > > >
> > > > Mike McCandless
> > > >
> > > > http://blog.mikemccandless.com
> > > >
> > > >
> > > > On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com>
> > wrote:
> > > > > Michael,
> > > > >
> > > > > Thanks for the update, so I just duplicate StandardAnalyzer and
> > > replace :
> > > > >
> > > > >
> > > > > //tok = new StopFilter(tok, stopwords);
> > > > >   tok = new SuggestStopFilter(tok, stopwords);
> > > > >
> > > > > in createComponents(..)
> > > > >
> > > > > Is there a way I can just override the method as in
> > > AnalyzingInfixSuggester
> > > > > rather than duplicating classes?
> > > > >
> > > > >
> > > > > Cheers Greg
> > > > >
> > > > > On 28 January 2017 at 10:31, Michael McCandless
> > > > <lu...@mikemccandless.com>
> > > > > wrote:
> > > > >>
> > > > >> Hi Greg,
> > > > >>
> > > > >> OK StandardAnalyzer does indeed use StopFilter, with English stop
> > > > >> words by default, which includes "will", so this explains what
> you are
> > > > >> seeing.
> > > > >>
> > > > >> I suggest making your own analyzer just like StandardAnalyzer,
> except
> > > > >> instead of StopFilter use the SuggestStopFilter class.
> > > > >>
> > > > >> That class was created for exactly the situation you're in, so
> that
> > > > >> "will" would not be filtered out as a stop word, but "will " is
> > > > >> (because it ends with a token separator).
> > > > >>
> > > > >> Either that or pass an empty stop word set to StandardAnalyzer,
> but
> > > > >> then you have no stop word filtering.
> > > > >>
> > > > >> This short blog post explains SuggestStopFilter:
> > > > >>
> > > > >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-
> carefully-
> > > > removes.html
> > > > >>
> > > > >> Mike McCandless
> > > > >>
> > > > >> http://blog.mikemccandless.com
> > > > >>
> > > > >>
> > > > >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
> > > > wrote:
> > > > >> > Michael,
> > > > >> >
> > > > >> > I am using the standard analyzer eith no stop words, and is
> build
> > > from
> > > > >> > an
> > > > >> > existing lucene index.
> > > > >> >
> > > > >> > org.apache.lucene.search.suggest.analyzing.
> AnalyzingInfixSuggester
> > > > >> >
> > > > >> > I am overriding the addContextToQuery to make it an AND rather
> > than
> > > > an
> > > > >> > OR
> > > > >> >
> > > > >> > public void addContextToQuery(Builder query, BytesRef context,
> > Occur
> > > > >> > clause)
> > > > >> > {
> > > > >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> > > > context)),
> > > > >> >                 BooleanClause.Occur.MUST);
> > > > >> >     }
> > > > >> >
> > > > >> > Cheers Greg
> > > > >> >
> > > > >> > On 27 January 2017 at 18:20, Michael McCandless
> > > > >> > <lu...@mikemccandless.com>
> > > > >> > wrote:
> > > > >> >>
> > > > >> >> Which suggester are you using?
> > > > >> >>
> > > > >> >> Maybe you are using a suggester with an analyzer, and your
> > analysis
> > > > >> >> chain includes a StopFilter and "will" is a stop word?
> > > > >> >>
> > > > >> >> Mike McCandless
> > > > >> >>
> > > > >> >> http://blog.mikemccandless.com
> > > > >> >>
> > > > >> >>
> > > > >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber
> > <gr...@gmail.com>
> > > > >> >> wrote:
> > > > >> >> > Hello,
> > > > >> >> >
> > > > >> >> > Is there anyway to see why items are returned from the
> > suggester?
> > > > >> >> > Similar
> > > > >> >> > to the search.
> > > > >> >> >
> > > > >> >> > I have a really strange case where if I enter 'will'
> (without the
> > > > >> >> > quotes)
> > > > >> >> > it seems to return all the search results.
> > > > >> >> >
> > > > >> >> > example:
> > > > >> >> >
> > > > >> >> > there should be two entries beginning with will*  ie william
> and
> > > > >> >> > Willoughby
> > > > >> >> >
> > > > >> >> > wil >  two entries with correct highlight
> > > > >> >> > will > all entries with NO highlight
> > > > >> >> > willi > single entry
> > > > >> >> > willo > single entry
> > > > >> >> >
> > > > >> >> > I have checked and I do not have will on all the entries!
> > > > >> >> >
> > > > >> >> > Cheers Greg
> > > > >> >
> > > > >> >
> > > > >
> > > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Strange results returned from suggester

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

CustomAnalyzer is a very generic thing. It has a builder that you can use to configure your analyzer. You can define which Tokenizer, which StopFilter (and pass stop words as you like), add stemming. No, it does not subclass StopWordAnalyzerBase, but that is also not needed, because it has a generic configuration interface.

So I don't understand you problem. Lucene APIs take the abstract Analyzer class and CustomAnalyzer provides it the same like StandardAnalyzer. CustomAnalyzer is basically the same like Solr's schema.xml and Elasticsearch's analyzer index config.

The first example in the Javadocs is more or less StandardAnalyzer, just adapt it and pass the factory:
http://lucene.apache.org/core/6_4_0/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Greg Huber [mailto:gregh3269@gmail.com]
> Sent: Sunday, January 29, 2017 12:48 PM
> To: java-user@lucene.apache.org
> Subject: Re: Strange results returned from suggester
> 
> Uwe,
> 
> >...or use CustomAnalyzer then you don't need to
> > subclass. Just decare the components.
> 
> If I need the StandardAnalyzer code (marked final) and this extends
> StopwordAnalyzerBase, how would I do this?
> 
> Cheers Greg
> 
> On 29 January 2017 at 11:32, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> > ...or use CustomAnalyzer then you don't need to subclass. Just decare the
> > components.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > Sent: Sunday, January 29, 2017 12:28 PM
> > > To: Greg Huber <gr...@gmail.com>; Lucene Users <java-
> > > user@lucene.apache.org>
> > > Subject: Re: Strange results returned from suggester
> > >
> > > That's right, just make your own analyzer, forked from
> > > StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> > > tiny class and this (creating your own components in an analyzers) is
> > > normal practice...
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com>
> wrote:
> > > > Michael,
> > > >
> > > > Thanks for the update, so I just duplicate StandardAnalyzer and
> > replace :
> > > >
> > > >
> > > > //tok = new StopFilter(tok, stopwords);
> > > >   tok = new SuggestStopFilter(tok, stopwords);
> > > >
> > > > in createComponents(..)
> > > >
> > > > Is there a way I can just override the method as in
> > AnalyzingInfixSuggester
> > > > rather than duplicating classes?
> > > >
> > > >
> > > > Cheers Greg
> > > >
> > > > On 28 January 2017 at 10:31, Michael McCandless
> > > <lu...@mikemccandless.com>
> > > > wrote:
> > > >>
> > > >> Hi Greg,
> > > >>
> > > >> OK StandardAnalyzer does indeed use StopFilter, with English stop
> > > >> words by default, which includes "will", so this explains what you are
> > > >> seeing.
> > > >>
> > > >> I suggest making your own analyzer just like StandardAnalyzer, except
> > > >> instead of StopFilter use the SuggestStopFilter class.
> > > >>
> > > >> That class was created for exactly the situation you're in, so that
> > > >> "will" would not be filtered out as a stop word, but "will " is
> > > >> (because it ends with a token separator).
> > > >>
> > > >> Either that or pass an empty stop word set to StandardAnalyzer, but
> > > >> then you have no stop word filtering.
> > > >>
> > > >> This short blog post explains SuggestStopFilter:
> > > >>
> > > >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-
> > > removes.html
> > > >>
> > > >> Mike McCandless
> > > >>
> > > >> http://blog.mikemccandless.com
> > > >>
> > > >>
> > > >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
> > > wrote:
> > > >> > Michael,
> > > >> >
> > > >> > I am using the standard analyzer eith no stop words, and is build
> > from
> > > >> > an
> > > >> > existing lucene index.
> > > >> >
> > > >> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
> > > >> >
> > > >> > I am overriding the addContextToQuery to make it an AND rather
> than
> > > an
> > > >> > OR
> > > >> >
> > > >> > public void addContextToQuery(Builder query, BytesRef context,
> Occur
> > > >> > clause)
> > > >> > {
> > > >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> > > context)),
> > > >> >                 BooleanClause.Occur.MUST);
> > > >> >     }
> > > >> >
> > > >> > Cheers Greg
> > > >> >
> > > >> > On 27 January 2017 at 18:20, Michael McCandless
> > > >> > <lu...@mikemccandless.com>
> > > >> > wrote:
> > > >> >>
> > > >> >> Which suggester are you using?
> > > >> >>
> > > >> >> Maybe you are using a suggester with an analyzer, and your
> analysis
> > > >> >> chain includes a StopFilter and "will" is a stop word?
> > > >> >>
> > > >> >> Mike McCandless
> > > >> >>
> > > >> >> http://blog.mikemccandless.com
> > > >> >>
> > > >> >>
> > > >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber
> <gr...@gmail.com>
> > > >> >> wrote:
> > > >> >> > Hello,
> > > >> >> >
> > > >> >> > Is there anyway to see why items are returned from the
> suggester?
> > > >> >> > Similar
> > > >> >> > to the search.
> > > >> >> >
> > > >> >> > I have a really strange case where if I enter 'will' (without the
> > > >> >> > quotes)
> > > >> >> > it seems to return all the search results.
> > > >> >> >
> > > >> >> > example:
> > > >> >> >
> > > >> >> > there should be two entries beginning with will*  ie william and
> > > >> >> > Willoughby
> > > >> >> >
> > > >> >> > wil >  two entries with correct highlight
> > > >> >> > will > all entries with NO highlight
> > > >> >> > willi > single entry
> > > >> >> > willo > single entry
> > > >> >> >
> > > >> >> > I have checked and I do not have will on all the entries!
> > > >> >> >
> > > >> >> > Cheers Greg
> > > >> >
> > > >> >
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange results returned from suggester

Posted by Greg Huber <gr...@gmail.com>.
Uwe,

>...or use CustomAnalyzer then you don't need to
> subclass. Just decare the components.

If I need the StandardAnalyzer code (marked final) and this extends
StopwordAnalyzerBase, how would I do this?

Cheers Greg

On 29 January 2017 at 11:32, Uwe Schindler <uw...@thetaphi.de> wrote:

> ...or use CustomAnalyzer then you don't need to subclass. Just decare the
> components.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > Sent: Sunday, January 29, 2017 12:28 PM
> > To: Greg Huber <gr...@gmail.com>; Lucene Users <java-
> > user@lucene.apache.org>
> > Subject: Re: Strange results returned from suggester
> >
> > That's right, just make your own analyzer, forked from
> > StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> > tiny class and this (creating your own components in an analyzers) is
> > normal practice...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com> wrote:
> > > Michael,
> > >
> > > Thanks for the update, so I just duplicate StandardAnalyzer and
> replace :
> > >
> > >
> > > //tok = new StopFilter(tok, stopwords);
> > >   tok = new SuggestStopFilter(tok, stopwords);
> > >
> > > in createComponents(..)
> > >
> > > Is there a way I can just override the method as in
> AnalyzingInfixSuggester
> > > rather than duplicating classes?
> > >
> > >
> > > Cheers Greg
> > >
> > > On 28 January 2017 at 10:31, Michael McCandless
> > <lu...@mikemccandless.com>
> > > wrote:
> > >>
> > >> Hi Greg,
> > >>
> > >> OK StandardAnalyzer does indeed use StopFilter, with English stop
> > >> words by default, which includes "will", so this explains what you are
> > >> seeing.
> > >>
> > >> I suggest making your own analyzer just like StandardAnalyzer, except
> > >> instead of StopFilter use the SuggestStopFilter class.
> > >>
> > >> That class was created for exactly the situation you're in, so that
> > >> "will" would not be filtered out as a stop word, but "will " is
> > >> (because it ends with a token separator).
> > >>
> > >> Either that or pass an empty stop word set to StandardAnalyzer, but
> > >> then you have no stop word filtering.
> > >>
> > >> This short blog post explains SuggestStopFilter:
> > >>
> > >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-
> > removes.html
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
> > wrote:
> > >> > Michael,
> > >> >
> > >> > I am using the standard analyzer eith no stop words, and is build
> from
> > >> > an
> > >> > existing lucene index.
> > >> >
> > >> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
> > >> >
> > >> > I am overriding the addContextToQuery to make it an AND rather than
> > an
> > >> > OR
> > >> >
> > >> > public void addContextToQuery(Builder query, BytesRef context, Occur
> > >> > clause)
> > >> > {
> > >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> > context)),
> > >> >                 BooleanClause.Occur.MUST);
> > >> >     }
> > >> >
> > >> > Cheers Greg
> > >> >
> > >> > On 27 January 2017 at 18:20, Michael McCandless
> > >> > <lu...@mikemccandless.com>
> > >> > wrote:
> > >> >>
> > >> >> Which suggester are you using?
> > >> >>
> > >> >> Maybe you are using a suggester with an analyzer, and your analysis
> > >> >> chain includes a StopFilter and "will" is a stop word?
> > >> >>
> > >> >> Mike McCandless
> > >> >>
> > >> >> http://blog.mikemccandless.com
> > >> >>
> > >> >>
> > >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com>
> > >> >> wrote:
> > >> >> > Hello,
> > >> >> >
> > >> >> > Is there anyway to see why items are returned from the suggester?
> > >> >> > Similar
> > >> >> > to the search.
> > >> >> >
> > >> >> > I have a really strange case where if I enter 'will' (without the
> > >> >> > quotes)
> > >> >> > it seems to return all the search results.
> > >> >> >
> > >> >> > example:
> > >> >> >
> > >> >> > there should be two entries beginning with will*  ie william and
> > >> >> > Willoughby
> > >> >> >
> > >> >> > wil >  two entries with correct highlight
> > >> >> > will > all entries with NO highlight
> > >> >> > willi > single entry
> > >> >> > willo > single entry
> > >> >> >
> > >> >> > I have checked and I do not have will on all the entries!
> > >> >> >
> > >> >> > Cheers Greg
> > >> >
> > >> >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Strange results returned from suggester

Posted by Uwe Schindler <uw...@thetaphi.de>.
...or use CustomAnalyzer then you don't need to subclass. Just decare the components.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Sunday, January 29, 2017 12:28 PM
> To: Greg Huber <gr...@gmail.com>; Lucene Users <java-
> user@lucene.apache.org>
> Subject: Re: Strange results returned from suggester
> 
> That's right, just make your own analyzer, forked from
> StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> tiny class and this (creating your own components in an analyzers) is
> normal practice...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com> wrote:
> > Michael,
> >
> > Thanks for the update, so I just duplicate StandardAnalyzer and replace :
> >
> >
> > //tok = new StopFilter(tok, stopwords);
> >   tok = new SuggestStopFilter(tok, stopwords);
> >
> > in createComponents(..)
> >
> > Is there a way I can just override the method as in AnalyzingInfixSuggester
> > rather than duplicating classes?
> >
> >
> > Cheers Greg
> >
> > On 28 January 2017 at 10:31, Michael McCandless
> <lu...@mikemccandless.com>
> > wrote:
> >>
> >> Hi Greg,
> >>
> >> OK StandardAnalyzer does indeed use StopFilter, with English stop
> >> words by default, which includes "will", so this explains what you are
> >> seeing.
> >>
> >> I suggest making your own analyzer just like StandardAnalyzer, except
> >> instead of StopFilter use the SuggestStopFilter class.
> >>
> >> That class was created for exactly the situation you're in, so that
> >> "will" would not be filtered out as a stop word, but "will " is
> >> (because it ends with a token separator).
> >>
> >> Either that or pass an empty stop word set to StandardAnalyzer, but
> >> then you have no stop word filtering.
> >>
> >> This short blog post explains SuggestStopFilter:
> >>
> >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-
> removes.html
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
> wrote:
> >> > Michael,
> >> >
> >> > I am using the standard analyzer eith no stop words, and is build from
> >> > an
> >> > existing lucene index.
> >> >
> >> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
> >> >
> >> > I am overriding the addContextToQuery to make it an AND rather than
> an
> >> > OR
> >> >
> >> > public void addContextToQuery(Builder query, BytesRef context, Occur
> >> > clause)
> >> > {
> >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> context)),
> >> >                 BooleanClause.Occur.MUST);
> >> >     }
> >> >
> >> > Cheers Greg
> >> >
> >> > On 27 January 2017 at 18:20, Michael McCandless
> >> > <lu...@mikemccandless.com>
> >> > wrote:
> >> >>
> >> >> Which suggester are you using?
> >> >>
> >> >> Maybe you are using a suggester with an analyzer, and your analysis
> >> >> chain includes a StopFilter and "will" is a stop word?
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >>
> >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com>
> >> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > Is there anyway to see why items are returned from the suggester?
> >> >> > Similar
> >> >> > to the search.
> >> >> >
> >> >> > I have a really strange case where if I enter 'will' (without the
> >> >> > quotes)
> >> >> > it seems to return all the search results.
> >> >> >
> >> >> > example:
> >> >> >
> >> >> > there should be two entries beginning with will*  ie william and
> >> >> > Willoughby
> >> >> >
> >> >> > wil >  two entries with correct highlight
> >> >> > will > all entries with NO highlight
> >> >> > willi > single entry
> >> >> > willo > single entry
> >> >> >
> >> >> > I have checked and I do not have will on all the entries!
> >> >> >
> >> >> > Cheers Greg
> >> >
> >> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange results returned from suggester

Posted by Michael McCandless <lu...@mikemccandless.com>.
Wonderful, thank you for bringing closure!  Stop words and analyzing
suggesters are a tricky combo ...

Mike McCandless

http://blog.mikemccandless.com


On Sun, Jan 29, 2017 at 6:37 AM, Greg Huber <gr...@gmail.com> wrote:
> Mike,
>
> Many thanks, it works perfectly now.
>
> Cheers Greg
>
> On 29 January 2017 at 11:28, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>>
>> That's right, just make your own analyzer, forked from
>> StandardAnalyzer, and change out the StopFilter.  The analyzer is a
>> tiny class and this (creating your own components in an analyzers) is
>> normal practice...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com> wrote:
>> > Michael,
>> >
>> > Thanks for the update, so I just duplicate StandardAnalyzer and replace
>> > :
>> >
>> >
>> > //tok = new StopFilter(tok, stopwords);
>> >   tok = new SuggestStopFilter(tok, stopwords);
>> >
>> > in createComponents(..)
>> >
>> > Is there a way I can just override the method as in
>> > AnalyzingInfixSuggester
>> > rather than duplicating classes?
>> >
>> >
>> > Cheers Greg
>> >
>> > On 28 January 2017 at 10:31, Michael McCandless
>> > <lu...@mikemccandless.com>
>> > wrote:
>> >>
>> >> Hi Greg,
>> >>
>> >> OK StandardAnalyzer does indeed use StopFilter, with English stop
>> >> words by default, which includes "will", so this explains what you are
>> >> seeing.
>> >>
>> >> I suggest making your own analyzer just like StandardAnalyzer, except
>> >> instead of StopFilter use the SuggestStopFilter class.
>> >>
>> >> That class was created for exactly the situation you're in, so that
>> >> "will" would not be filtered out as a stop word, but "will " is
>> >> (because it ends with a token separator).
>> >>
>> >> Either that or pass an empty stop word set to StandardAnalyzer, but
>> >> then you have no stop word filtering.
>> >>
>> >> This short blog post explains SuggestStopFilter:
>> >>
>> >>
>> >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
>> >> wrote:
>> >> > Michael,
>> >> >
>> >> > I am using the standard analyzer eith no stop words, and is build
>> >> > from
>> >> > an
>> >> > existing lucene index.
>> >> >
>> >> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
>> >> >
>> >> > I am overriding the addContextToQuery to make it an AND rather than
>> >> > an
>> >> > OR
>> >> >
>> >> > public void addContextToQuery(Builder query, BytesRef context, Occur
>> >> > clause)
>> >> > {
>> >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
>> >> > context)),
>> >> >                 BooleanClause.Occur.MUST);
>> >> >     }
>> >> >
>> >> > Cheers Greg
>> >> >
>> >> > On 27 January 2017 at 18:20, Michael McCandless
>> >> > <lu...@mikemccandless.com>
>> >> > wrote:
>> >> >>
>> >> >> Which suggester are you using?
>> >> >>
>> >> >> Maybe you are using a suggester with an analyzer, and your analysis
>> >> >> chain includes a StopFilter and "will" is a stop word?
>> >> >>
>> >> >> Mike McCandless
>> >> >>
>> >> >> http://blog.mikemccandless.com
>> >> >>
>> >> >>
>> >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com>
>> >> >> wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > Is there anyway to see why items are returned from the suggester?
>> >> >> > Similar
>> >> >> > to the search.
>> >> >> >
>> >> >> > I have a really strange case where if I enter 'will' (without the
>> >> >> > quotes)
>> >> >> > it seems to return all the search results.
>> >> >> >
>> >> >> > example:
>> >> >> >
>> >> >> > there should be two entries beginning with will*  ie william and
>> >> >> > Willoughby
>> >> >> >
>> >> >> > wil >  two entries with correct highlight
>> >> >> > will > all entries with NO highlight
>> >> >> > willi > single entry
>> >> >> > willo > single entry
>> >> >> >
>> >> >> > I have checked and I do not have will on all the entries!
>> >> >> >
>> >> >> > Cheers Greg
>> >> >
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange results returned from suggester

Posted by Greg Huber <gr...@gmail.com>.
Mike,

Many thanks, it works perfectly now.

Cheers Greg

On 29 January 2017 at 11:28, Michael McCandless <lu...@mikemccandless.com>
wrote:

> That's right, just make your own analyzer, forked from
> StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> tiny class and this (creating your own components in an analyzers) is
> normal practice...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com> wrote:
> > Michael,
> >
> > Thanks for the update, so I just duplicate StandardAnalyzer and replace :
> >
> >
> > //tok = new StopFilter(tok, stopwords);
> >   tok = new SuggestStopFilter(tok, stopwords);
> >
> > in createComponents(..)
> >
> > Is there a way I can just override the method as in
> AnalyzingInfixSuggester
> > rather than duplicating classes?
> >
> >
> > Cheers Greg
> >
> > On 28 January 2017 at 10:31, Michael McCandless <
> lucene@mikemccandless.com>
> > wrote:
> >>
> >> Hi Greg,
> >>
> >> OK StandardAnalyzer does indeed use StopFilter, with English stop
> >> words by default, which includes "will", so this explains what you are
> >> seeing.
> >>
> >> I suggest making your own analyzer just like StandardAnalyzer, except
> >> instead of StopFilter use the SuggestStopFilter class.
> >>
> >> That class was created for exactly the situation you're in, so that
> >> "will" would not be filtered out as a stop word, but "will " is
> >> (because it ends with a token separator).
> >>
> >> Either that or pass an empty stop word set to StandardAnalyzer, but
> >> then you have no stop word filtering.
> >>
> >> This short blog post explains SuggestStopFilter:
> >>
> >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-
> carefully-removes.html
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com>
> wrote:
> >> > Michael,
> >> >
> >> > I am using the standard analyzer eith no stop words, and is build from
> >> > an
> >> > existing lucene index.
> >> >
> >> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
> >> >
> >> > I am overriding the addContextToQuery to make it an AND rather than an
> >> > OR
> >> >
> >> > public void addContextToQuery(Builder query, BytesRef context, Occur
> >> > clause)
> >> > {
> >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> context)),
> >> >                 BooleanClause.Occur.MUST);
> >> >     }
> >> >
> >> > Cheers Greg
> >> >
> >> > On 27 January 2017 at 18:20, Michael McCandless
> >> > <lu...@mikemccandless.com>
> >> > wrote:
> >> >>
> >> >> Which suggester are you using?
> >> >>
> >> >> Maybe you are using a suggester with an analyzer, and your analysis
> >> >> chain includes a StopFilter and "will" is a stop word?
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >>
> >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com>
> >> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > Is there anyway to see why items are returned from the suggester?
> >> >> > Similar
> >> >> > to the search.
> >> >> >
> >> >> > I have a really strange case where if I enter 'will' (without the
> >> >> > quotes)
> >> >> > it seems to return all the search results.
> >> >> >
> >> >> > example:
> >> >> >
> >> >> > there should be two entries beginning with will*  ie william and
> >> >> > Willoughby
> >> >> >
> >> >> > wil >  two entries with correct highlight
> >> >> > will > all entries with NO highlight
> >> >> > willi > single entry
> >> >> > willo > single entry
> >> >> >
> >> >> > I have checked and I do not have will on all the entries!
> >> >> >
> >> >> > Cheers Greg
> >> >
> >> >
> >
> >
>

Re: Strange results returned from suggester

Posted by Michael McCandless <lu...@mikemccandless.com>.
That's right, just make your own analyzer, forked from
StandardAnalyzer, and change out the StopFilter.  The analyzer is a
tiny class and this (creating your own components in an analyzers) is
normal practice...

Mike McCandless

http://blog.mikemccandless.com


On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gr...@gmail.com> wrote:
> Michael,
>
> Thanks for the update, so I just duplicate StandardAnalyzer and replace :
>
>
> //tok = new StopFilter(tok, stopwords);
>   tok = new SuggestStopFilter(tok, stopwords);
>
> in createComponents(..)
>
> Is there a way I can just override the method as in AnalyzingInfixSuggester
> rather than duplicating classes?
>
>
> Cheers Greg
>
> On 28 January 2017 at 10:31, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>>
>> Hi Greg,
>>
>> OK StandardAnalyzer does indeed use StopFilter, with English stop
>> words by default, which includes "will", so this explains what you are
>> seeing.
>>
>> I suggest making your own analyzer just like StandardAnalyzer, except
>> instead of StopFilter use the SuggestStopFilter class.
>>
>> That class was created for exactly the situation you're in, so that
>> "will" would not be filtered out as a stop word, but "will " is
>> (because it ends with a token separator).
>>
>> Either that or pass an empty stop word set to StandardAnalyzer, but
>> then you have no stop word filtering.
>>
>> This short blog post explains SuggestStopFilter:
>>
>> http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com> wrote:
>> > Michael,
>> >
>> > I am using the standard analyzer eith no stop words, and is build from
>> > an
>> > existing lucene index.
>> >
>> > org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
>> >
>> > I am overriding the addContextToQuery to make it an AND rather than an
>> > OR
>> >
>> > public void addContextToQuery(Builder query, BytesRef context, Occur
>> > clause)
>> > {
>> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME, context)),
>> >                 BooleanClause.Occur.MUST);
>> >     }
>> >
>> > Cheers Greg
>> >
>> > On 27 January 2017 at 18:20, Michael McCandless
>> > <lu...@mikemccandless.com>
>> > wrote:
>> >>
>> >> Which suggester are you using?
>> >>
>> >> Maybe you are using a suggester with an analyzer, and your analysis
>> >> chain includes a StopFilter and "will" is a stop word?
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > Is there anyway to see why items are returned from the suggester?
>> >> > Similar
>> >> > to the search.
>> >> >
>> >> > I have a really strange case where if I enter 'will' (without the
>> >> > quotes)
>> >> > it seems to return all the search results.
>> >> >
>> >> > example:
>> >> >
>> >> > there should be two entries beginning with will*  ie william and
>> >> > Willoughby
>> >> >
>> >> > wil >  two entries with correct highlight
>> >> > will > all entries with NO highlight
>> >> > willi > single entry
>> >> > willo > single entry
>> >> >
>> >> > I have checked and I do not have will on all the entries!
>> >> >
>> >> > Cheers Greg
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange results returned from suggester

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi Greg,

OK StandardAnalyzer does indeed use StopFilter, with English stop
words by default, which includes "will", so this explains what you are
seeing.

I suggest making your own analyzer just like StandardAnalyzer, except
instead of StopFilter use the SuggestStopFilter class.

That class was created for exactly the situation you're in, so that
"will" would not be filtered out as a stop word, but "will " is
(because it ends with a token separator).

Either that or pass an empty stop word set to StandardAnalyzer, but
then you have no stop word filtering.

This short blog post explains SuggestStopFilter:
http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html

Mike McCandless

http://blog.mikemccandless.com


On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gr...@gmail.com> wrote:
> Michael,
>
> I am using the standard analyzer eith no stop words, and is build from an
> existing lucene index.
>
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
>
> I am overriding the addContextToQuery to make it an AND rather than an OR
>
> public void addContextToQuery(Builder query, BytesRef context, Occur clause)
> {
>         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME, context)),
>                 BooleanClause.Occur.MUST);
>     }
>
> Cheers Greg
>
> On 27 January 2017 at 18:20, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>>
>> Which suggester are you using?
>>
>> Maybe you are using a suggester with an analyzer, and your analysis
>> chain includes a StopFilter and "will" is a stop word?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com> wrote:
>> > Hello,
>> >
>> > Is there anyway to see why items are returned from the suggester?
>> > Similar
>> > to the search.
>> >
>> > I have a really strange case where if I enter 'will' (without the
>> > quotes)
>> > it seems to return all the search results.
>> >
>> > example:
>> >
>> > there should be two entries beginning with will*  ie william and
>> > Willoughby
>> >
>> > wil >  two entries with correct highlight
>> > will > all entries with NO highlight
>> > willi > single entry
>> > willo > single entry
>> >
>> > I have checked and I do not have will on all the entries!
>> >
>> > Cheers Greg
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Strange results returned from suggester

Posted by Greg Huber <gr...@gmail.com>.
Michael,

I am using the standard analyzer eith no stop words, and is build from an
existing lucene index.

org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester

I am overriding the addContextToQuery to make it an AND rather than an OR

public void addContextToQuery(Builder query, BytesRef context, Occur
clause) {
        query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME, context)),
                BooleanClause.Occur.MUST);
    }

Cheers Greg

On 27 January 2017 at 18:20, Michael McCandless <lu...@mikemccandless.com>
wrote:

> Which suggester are you using?
>
> Maybe you are using a suggester with an analyzer, and your analysis
> chain includes a StopFilter and "will" is a stop word?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com> wrote:
> > Hello,
> >
> > Is there anyway to see why items are returned from the suggester?
> Similar
> > to the search.
> >
> > I have a really strange case where if I enter 'will' (without the quotes)
> > it seems to return all the search results.
> >
> > example:
> >
> > there should be two entries beginning with will*  ie william and
> Willoughby
> >
> > wil >  two entries with correct highlight
> > will > all entries with NO highlight
> > willi > single entry
> > willo > single entry
> >
> > I have checked and I do not have will on all the entries!
> >
> > Cheers Greg
>

Re: Strange results returned from suggester

Posted by Michael McCandless <lu...@mikemccandless.com>.
Which suggester are you using?

Maybe you are using a suggester with an analyzer, and your analysis
chain includes a StopFilter and "will" is a stop word?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber <gr...@gmail.com> wrote:
> Hello,
>
> Is there anyway to see why items are returned from the suggester?  Similar
> to the search.
>
> I have a really strange case where if I enter 'will' (without the quotes)
> it seems to return all the search results.
>
> example:
>
> there should be two entries beginning with will*  ie william and Willoughby
>
> wil >  two entries with correct highlight
> will > all entries with NO highlight
> willi > single entry
> willo > single entry
>
> I have checked and I do not have will on all the entries!
>
> Cheers Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org