You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prathik Puthran <pr...@gmail.com> on 2013/06/05 17:59:43 UTC

Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Got it. It's actually contrast to usual prefix suggestions.
So, out-of-the box it's provided by
http://wiki.apache.org/solr/TermsComponent terms.regex= also see last
example there
it should works by loading terms in memory and linearly scanning them with
regexp.
There is nothing more efficient out-of-the box.
http://wiki.apache.org/solr/Suggester says Support for infix-suggestions
_is planned_ for FSTLookup (which would be the only structure to support
these).


On Thu, Jun 6, 2013 at 10:25 AM, Prathik Puthran <
prathik.puthran87@gmail.com> wrote:

> My use case is I want to search for any substring of the indexed string and
> the Suggester should suggest the indexed string. What can I do to make this
> work?
>
> Thanks,
> Prathik
>
>
> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com
> > wrote:
>
> > Please excuse my misunderstanding, but I always wonder why this index
> time
> > processing is suggested usually. from my POV is the case for query-time
> > processing i.e. PrefixQuery aka wildcard query Jason* .
> > Ultra-fast term retrieval also provided by TermsComponent.
> >
> >
> > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <jack@basetechnology.com
> > >wrote:
> >
> > > ngrams?
> > >
> > > See:
> > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> >
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Prathik Puthran
> > > Sent: Wednesday, June 05, 2013 11:59 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Configuring lucene to suggest the indexed string for all the
> > > searches of the substring of the indexed string
> > >
> > >
> > > Hi,
> > >
> > > Is it possible to configure solr to suggest the indexed string for all
> > the
> > > searches of the substring of the string?
> > >
> > > Thanks,
> > > Prathik
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
Our dictionary is has very less words. So it is more of a feature to the
user than a nuisance.

Thanks,
Prathik


On Mon, Jun 10, 2013 at 10:52 PM, Walter Underwood <wu...@wunderwood.org>wrote:

> Why do you think that is useful? That will give terrible search results.
>
> Here are the first twenty words in /usr/share/dict/words that contain the
> substring "cat".
>
> abacate
> abdicate
> abdication
> abdicative
> abdicator
> aberuncator
> abjudicate
> abjudication
> acacatechin
> acacatechol
> acatalectic
> acatalepsia
> acatalepsy
> acataleptic
> acatallactic
> acatamathesia
> acataphasia
> acataposis
> acatastasia
> acatastatic
>
> wunder
>
> On Jun 9, 2013, at 10:56 PM, Prathik Puthran wrote:
>
> > Hi,
> >
> > @Walter
> > I'm trying to implement the below feature for the user.
> > User types in any "substring" of the strings in the dictionary (i.e. the
> > indexed string) .
> > SOLR Suggester should return all the strings in the dictionary which has
> > the input string as substring.
> >
> > Thanks,
> > Prathik
> >
> >
> >
> > On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com
> >> wrote:
> >
> >> Hi
> >>
> >> Ngrams *will* do this for you.
> >>
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >> On Jun 6, 2013 7:53 AM, "Prathik Puthran" <pr...@gmail.com>
> >> wrote:
> >>
> >>> Basically I want the Suggester to return for "Jason Bourne" as
> suggestion
> >>> for ".*Bour.*" regex.
> >>>
> >>> Thanks,
> >>> Prathik
> >>>
> >>>
> >>> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> >>> prathik.puthran87@gmail.com> wrote:
> >>>
> >>>> This works even now i.e. when I search for "Jas" it suggests "Jason
> >>>> Bourne". What I want is when I search for "Bour" or "ason" (any
> >>> substring)
> >>>> it should suggest me "Jason Bourne" .
> >>>>
> >>>>
> >>>> On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
> >>>>
> >>>>> Can you se the ShingleFilterFactory? It is ngrams for terms rather
> >> than
> >>>>> characters. If you limited it to two term ngrams, when the user
> >> presses
> >>>>> space after their first word, you could do a suggested query against
> >>>>> your two term ngram field, which would suggest Jason Bourne, Jason
> >>>>> Statham, etc then you press space after "Jason".
> >>>>>
> >>>>> Upayavira
> >>>>>
> >>>>> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> >>>>>> My use case is I want to search for any substring of the indexed
> >>> string
> >>>>>> and
> >>>>>> the Suggester should suggest the indexed string. What can I do to
> >> make
> >>>>>> this
> >>>>>> work?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Prathik
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> >>>>>> <mkhludnev@griddynamics.com
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> Please excuse my misunderstanding, but I always wonder why this
> >>> index
> >>>>> time
> >>>>>>> processing is suggested usually. from my POV is the case for
> >>>>> query-time
> >>>>>>> processing i.e. PrefixQuery aka wildcard query Jason* .
> >>>>>>> Ultra-fast term retrieval also provided by TermsComponent.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> >>>>> jack@basetechnology.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> ngrams?
> >>>>>>>>
> >>>>>>>> See:
> >>>>>>>> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> >>>>>>>> apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> >>>>>>>
> >>>>>
> >>>
> >>
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -- Jack Krupansky
> >>>>>>>>
> >>>>>>>> -----Original Message----- From: Prathik Puthran
> >>>>>>>> Sent: Wednesday, June 05, 2013 11:59 AM
> >>>>>>>> To: solr-user@lucene.apache.org
> >>>>>>>> Subject: Configuring lucene to suggest the indexed string for
> >> all
> >>>>> the
> >>>>>>>> searches of the substring of the indexed string
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Is it possible to configure solr to suggest the indexed string
> >> for
> >>>>> all
> >>>>>>> the
> >>>>>>>> searches of the substring of the string?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Prathik
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Sincerely yours
> >>>>>>> Mikhail Khludnev
> >>>>>>> Principal Engineer,
> >>>>>>> Grid Dynamics
> >>>>>>>
> >>>>>>> <http://www.griddynamics.com>
> >>>>>>> <mk...@griddynamics.com>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Walter Underwood <wu...@wunderwood.org>.
Why do you think that is useful? That will give terrible search results. 

Here are the first twenty words in /usr/share/dict/words that contain the substring "cat".

abacate
abdicate
abdication
abdicative
abdicator
aberuncator
abjudicate
abjudication
acacatechin
acacatechol
acatalectic
acatalepsia
acatalepsy
acataleptic
acatallactic
acatamathesia
acataphasia
acataposis
acatastasia
acatastatic

wunder

On Jun 9, 2013, at 10:56 PM, Prathik Puthran wrote:

> Hi,
> 
> @Walter
> I'm trying to implement the below feature for the user.
> User types in any "substring" of the strings in the dictionary (i.e. the
> indexed string) .
> SOLR Suggester should return all the strings in the dictionary which has
> the input string as substring.
> 
> Thanks,
> Prathik
> 
> 
> 
> On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
>> wrote:
> 
>> Hi
>> 
>> Ngrams *will* do this for you.
>> 
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Jun 6, 2013 7:53 AM, "Prathik Puthran" <pr...@gmail.com>
>> wrote:
>> 
>>> Basically I want the Suggester to return for "Jason Bourne" as suggestion
>>> for ".*Bour.*" regex.
>>> 
>>> Thanks,
>>> Prathik
>>> 
>>> 
>>> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
>>> prathik.puthran87@gmail.com> wrote:
>>> 
>>>> This works even now i.e. when I search for "Jas" it suggests "Jason
>>>> Bourne". What I want is when I search for "Bour" or "ason" (any
>>> substring)
>>>> it should suggest me "Jason Bourne" .
>>>> 
>>>> 
>>>> On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
>>>> 
>>>>> Can you se the ShingleFilterFactory? It is ngrams for terms rather
>> than
>>>>> characters. If you limited it to two term ngrams, when the user
>> presses
>>>>> space after their first word, you could do a suggested query against
>>>>> your two term ngram field, which would suggest Jason Bourne, Jason
>>>>> Statham, etc then you press space after "Jason".
>>>>> 
>>>>> Upayavira
>>>>> 
>>>>> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
>>>>>> My use case is I want to search for any substring of the indexed
>>> string
>>>>>> and
>>>>>> the Suggester should suggest the indexed string. What can I do to
>> make
>>>>>> this
>>>>>> work?
>>>>>> 
>>>>>> Thanks,
>>>>>> Prathik
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
>>>>>> <mkhludnev@griddynamics.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Please excuse my misunderstanding, but I always wonder why this
>>> index
>>>>> time
>>>>>>> processing is suggested usually. from my POV is the case for
>>>>> query-time
>>>>>>> processing i.e. PrefixQuery aka wildcard query Jason* .
>>>>>>> Ultra-fast term retrieval also provided by TermsComponent.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
>>>>> jack@basetechnology.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> ngrams?
>>>>>>>> 
>>>>>>>> See:
>>>>>>>> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
>>>>>>>> apache/lucene/analysis/ngram/**NGramFilterFactory.html<
>>>>>>> 
>>>>> 
>>> 
>> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- Jack Krupansky
>>>>>>>> 
>>>>>>>> -----Original Message----- From: Prathik Puthran
>>>>>>>> Sent: Wednesday, June 05, 2013 11:59 AM
>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>> Subject: Configuring lucene to suggest the indexed string for
>> all
>>>>> the
>>>>>>>> searches of the substring of the indexed string
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Is it possible to configure solr to suggest the indexed string
>> for
>>>>> all
>>>>>>> the
>>>>>>>> searches of the substring of the string?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Prathik
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Sincerely yours
>>>>>>> Mikhail Khludnev
>>>>>>> Principal Engineer,
>>>>>>> Grid Dynamics
>>>>>>> 
>>>>>>> <http://www.griddynamics.com>
>>>>>>> <mk...@griddynamics.com>
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
Hi,

@Walter
I'm trying to implement the below feature for the user.
User types in any "substring" of the strings in the dictionary (i.e. the
indexed string) .
SOLR Suggester should return all the strings in the dictionary which has
the input string as substring.

Thanks,
Prathik



On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi
>
> Ngrams *will* do this for you.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jun 6, 2013 7:53 AM, "Prathik Puthran" <pr...@gmail.com>
> wrote:
>
> > Basically I want the Suggester to return for "Jason Bourne" as suggestion
> > for ".*Bour.*" regex.
> >
> > Thanks,
> > Prathik
> >
> >
> > On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> > prathik.puthran87@gmail.com> wrote:
> >
> > > This works even now i.e. when I search for "Jas" it suggests "Jason
> > > Bourne". What I want is when I search for "Bour" or "ason" (any
> > substring)
> > > it should suggest me "Jason Bourne" .
> > >
> > >
> > > On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
> > >
> > >> Can you se the ShingleFilterFactory? It is ngrams for terms rather
> than
> > >> characters. If you limited it to two term ngrams, when the user
> presses
> > >> space after their first word, you could do a suggested query against
> > >> your two term ngram field, which would suggest Jason Bourne, Jason
> > >> Statham, etc then you press space after "Jason".
> > >>
> > >> Upayavira
> > >>
> > >> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> > >> > My use case is I want to search for any substring of the indexed
> > string
> > >> > and
> > >> > the Suggester should suggest the indexed string. What can I do to
> make
> > >> > this
> > >> > work?
> > >> >
> > >> > Thanks,
> > >> > Prathik
> > >> >
> > >> >
> > >> > On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> > >> > <mkhludnev@griddynamics.com
> > >> > > wrote:
> > >> >
> > >> > > Please excuse my misunderstanding, but I always wonder why this
> > index
> > >> time
> > >> > > processing is suggested usually. from my POV is the case for
> > >> query-time
> > >> > > processing i.e. PrefixQuery aka wildcard query Jason* .
> > >> > > Ultra-fast term retrieval also provided by TermsComponent.
> > >> > >
> > >> > >
> > >> > > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> > >> jack@basetechnology.com
> > >> > > >wrote:
> > >> > >
> > >> > > > ngrams?
> > >> > > >
> > >> > > > See:
> > >> > > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > >> > > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> > >> > >
> > >>
> >
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> > >> > > >
> > >> > > >
> > >> > > > -- Jack Krupansky
> > >> > > >
> > >> > > > -----Original Message----- From: Prathik Puthran
> > >> > > > Sent: Wednesday, June 05, 2013 11:59 AM
> > >> > > > To: solr-user@lucene.apache.org
> > >> > > > Subject: Configuring lucene to suggest the indexed string for
> all
> > >> the
> > >> > > > searches of the substring of the indexed string
> > >> > > >
> > >> > > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > Is it possible to configure solr to suggest the indexed string
> for
> > >> all
> > >> > > the
> > >> > > > searches of the substring of the string?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Prathik
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Sincerely yours
> > >> > > Mikhail Khludnev
> > >> > > Principal Engineer,
> > >> > > Grid Dynamics
> > >> > >
> > >> > > <http://www.griddynamics.com>
> > >> > >  <mk...@griddynamics.com>
> > >> > >
> > >>
> > >
> > >
> >
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi

Ngrams *will* do this for you.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 6, 2013 7:53 AM, "Prathik Puthran" <pr...@gmail.com>
wrote:

> Basically I want the Suggester to return for "Jason Bourne" as suggestion
> for ".*Bour.*" regex.
>
> Thanks,
> Prathik
>
>
> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> prathik.puthran87@gmail.com> wrote:
>
> > This works even now i.e. when I search for "Jas" it suggests "Jason
> > Bourne". What I want is when I search for "Bour" or "ason" (any
> substring)
> > it should suggest me "Jason Bourne" .
> >
> >
> > On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
> >
> >> Can you se the ShingleFilterFactory? It is ngrams for terms rather than
> >> characters. If you limited it to two term ngrams, when the user presses
> >> space after their first word, you could do a suggested query against
> >> your two term ngram field, which would suggest Jason Bourne, Jason
> >> Statham, etc then you press space after "Jason".
> >>
> >> Upayavira
> >>
> >> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> >> > My use case is I want to search for any substring of the indexed
> string
> >> > and
> >> > the Suggester should suggest the indexed string. What can I do to make
> >> > this
> >> > work?
> >> >
> >> > Thanks,
> >> > Prathik
> >> >
> >> >
> >> > On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> >> > <mkhludnev@griddynamics.com
> >> > > wrote:
> >> >
> >> > > Please excuse my misunderstanding, but I always wonder why this
> index
> >> time
> >> > > processing is suggested usually. from my POV is the case for
> >> query-time
> >> > > processing i.e. PrefixQuery aka wildcard query Jason* .
> >> > > Ultra-fast term retrieval also provided by TermsComponent.
> >> > >
> >> > >
> >> > > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> >> jack@basetechnology.com
> >> > > >wrote:
> >> > >
> >> > > > ngrams?
> >> > > >
> >> > > > See:
> >> > > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> >> > > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> >> > >
> >>
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> >> > > >
> >> > > >
> >> > > > -- Jack Krupansky
> >> > > >
> >> > > > -----Original Message----- From: Prathik Puthran
> >> > > > Sent: Wednesday, June 05, 2013 11:59 AM
> >> > > > To: solr-user@lucene.apache.org
> >> > > > Subject: Configuring lucene to suggest the indexed string for all
> >> the
> >> > > > searches of the substring of the indexed string
> >> > > >
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > Is it possible to configure solr to suggest the indexed string for
> >> all
> >> > > the
> >> > > > searches of the substring of the string?
> >> > > >
> >> > > > Thanks,
> >> > > > Prathik
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours
> >> > > Mikhail Khludnev
> >> > > Principal Engineer,
> >> > > Grid Dynamics
> >> > >
> >> > > <http://www.griddynamics.com>
> >> > >  <mk...@griddynamics.com>
> >> > >
> >>
> >
> >
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Walter Underwood <wu...@wunderwood.org>.
Let's clear up some things about how Solr works.

1. Solr matches individual words, not the whole text. So "Jason Bourne" is split into ["Jason", "Bourne"]. The leading ".*" in your pattern does not match preceding words, it would match the beginning of a single word.

2. Query time wildcards test every word in the index. This might be a billion words. Of course that is slow. This is why we try to do things at index time. With ngrams, there is one lookup, not a billion wildcard matches.

3. Regexes will almost always be the slowest way to do something in Solr, and are almost always too slow for production.

Now, what are you trying to do for the user? It seems like you have decided on a solution and are asking about that.

Solr already has many built-in solutions, so if we know the root problem, we may find an easy solution.

wunder

On Jun 6, 2013, at 4:53 AM, Prathik Puthran wrote:

> Basically I want the Suggester to return for "Jason Bourne" as suggestion
> for ".*Bour.*" regex.
> 
> Thanks,
> Prathik
> 
> 
> On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> prathik.puthran87@gmail.com> wrote:
> 
>> This works even now i.e. when I search for "Jas" it suggests "Jason
>> Bourne". What I want is when I search for "Bour" or "ason" (any substring)
>> it should suggest me "Jason Bourne" .
>> 
>> 
>> On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
>> 
>>> Can you se the ShingleFilterFactory? It is ngrams for terms rather than
>>> characters. If you limited it to two term ngrams, when the user presses
>>> space after their first word, you could do a suggested query against
>>> your two term ngram field, which would suggest Jason Bourne, Jason
>>> Statham, etc then you press space after "Jason".
>>> 
>>> Upayavira
>>> 
>>> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
>>>> My use case is I want to search for any substring of the indexed string
>>>> and
>>>> the Suggester should suggest the indexed string. What can I do to make
>>>> this
>>>> work?
>>>> 
>>>> Thanks,
>>>> Prathik
>>>> 
>>>> 
>>>> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
>>>> <mkhludnev@griddynamics.com
>>>>> wrote:
>>>> 
>>>>> Please excuse my misunderstanding, but I always wonder why this index
>>> time
>>>>> processing is suggested usually. from my POV is the case for
>>> query-time
>>>>> processing i.e. PrefixQuery aka wildcard query Jason* .
>>>>> Ultra-fast term retrieval also provided by TermsComponent.
>>>>> 
>>>>> 
>>>>> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
>>> jack@basetechnology.com
>>>>>> wrote:
>>>>> 
>>>>>> ngrams?
>>>>>> 
>>>>>> See:
>>>>>> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
>>>>>> apache/lucene/analysis/ngram/**NGramFilterFactory.html<
>>>>> 
>>> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
>>>>>> 
>>>>>> 
>>>>>> -- Jack Krupansky
>>>>>> 
>>>>>> -----Original Message----- From: Prathik Puthran
>>>>>> Sent: Wednesday, June 05, 2013 11:59 AM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Configuring lucene to suggest the indexed string for all
>>> the
>>>>>> searches of the substring of the indexed string
>>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Is it possible to configure solr to suggest the indexed string for
>>> all
>>>>> the
>>>>>> searches of the substring of the string?
>>>>>> 
>>>>>> Thanks,
>>>>>> Prathik
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>> 
>>>>> <http://www.griddynamics.com>
>>>>> <mk...@griddynamics.com>
>>>>> 
>>> 





Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
Basically I want the Suggester to return for "Jason Bourne" as suggestion
for ".*Bour.*" regex.

Thanks,
Prathik


On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
prathik.puthran87@gmail.com> wrote:

> This works even now i.e. when I search for "Jas" it suggests "Jason
> Bourne". What I want is when I search for "Bour" or "ason" (any substring)
> it should suggest me "Jason Bourne" .
>
>
> On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:
>
>> Can you se the ShingleFilterFactory? It is ngrams for terms rather than
>> characters. If you limited it to two term ngrams, when the user presses
>> space after their first word, you could do a suggested query against
>> your two term ngram field, which would suggest Jason Bourne, Jason
>> Statham, etc then you press space after "Jason".
>>
>> Upayavira
>>
>> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
>> > My use case is I want to search for any substring of the indexed string
>> > and
>> > the Suggester should suggest the indexed string. What can I do to make
>> > this
>> > work?
>> >
>> > Thanks,
>> > Prathik
>> >
>> >
>> > On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
>> > <mkhludnev@griddynamics.com
>> > > wrote:
>> >
>> > > Please excuse my misunderstanding, but I always wonder why this index
>> time
>> > > processing is suggested usually. from my POV is the case for
>> query-time
>> > > processing i.e. PrefixQuery aka wildcard query Jason* .
>> > > Ultra-fast term retrieval also provided by TermsComponent.
>> > >
>> > >
>> > > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
>> jack@basetechnology.com
>> > > >wrote:
>> > >
>> > > > ngrams?
>> > > >
>> > > > See:
>> > > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
>> > > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
>> > >
>> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
>> > > >
>> > > >
>> > > > -- Jack Krupansky
>> > > >
>> > > > -----Original Message----- From: Prathik Puthran
>> > > > Sent: Wednesday, June 05, 2013 11:59 AM
>> > > > To: solr-user@lucene.apache.org
>> > > > Subject: Configuring lucene to suggest the indexed string for all
>> the
>> > > > searches of the substring of the indexed string
>> > > >
>> > > >
>> > > > Hi,
>> > > >
>> > > > Is it possible to configure solr to suggest the indexed string for
>> all
>> > > the
>> > > > searches of the substring of the string?
>> > > >
>> > > > Thanks,
>> > > > Prathik
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours
>> > > Mikhail Khludnev
>> > > Principal Engineer,
>> > > Grid Dynamics
>> > >
>> > > <http://www.griddynamics.com>
>> > >  <mk...@griddynamics.com>
>> > >
>>
>
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
This works even now i.e. when I search for "Jas" it suggests "Jason
Bourne". What I want is when I search for "Bour" or "ason" (any substring)
it should suggest me "Jason Bourne" .


On Thu, Jun 6, 2013 at 12:34 PM, Upayavira <uv...@odoko.co.uk> wrote:

> Can you se the ShingleFilterFactory? It is ngrams for terms rather than
> characters. If you limited it to two term ngrams, when the user presses
> space after their first word, you could do a suggested query against
> your two term ngram field, which would suggest Jason Bourne, Jason
> Statham, etc then you press space after "Jason".
>
> Upayavira
>
> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> > My use case is I want to search for any substring of the indexed string
> > and
> > the Suggester should suggest the indexed string. What can I do to make
> > this
> > work?
> >
> > Thanks,
> > Prathik
> >
> >
> > On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> > <mkhludnev@griddynamics.com
> > > wrote:
> >
> > > Please excuse my misunderstanding, but I always wonder why this index
> time
> > > processing is suggested usually. from my POV is the case for query-time
> > > processing i.e. PrefixQuery aka wildcard query Jason* .
> > > Ultra-fast term retrieval also provided by TermsComponent.
> > >
> > >
> > > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> jack@basetechnology.com
> > > >wrote:
> > >
> > > > ngrams?
> > > >
> > > > See:
> > > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> > >
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -----Original Message----- From: Prathik Puthran
> > > > Sent: Wednesday, June 05, 2013 11:59 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Configuring lucene to suggest the indexed string for all the
> > > > searches of the substring of the indexed string
> > > >
> > > >
> > > > Hi,
> > > >
> > > > Is it possible to configure solr to suggest the indexed string for
> all
> > > the
> > > > searches of the substring of the string?
> > > >
> > > > Thanks,
> > > > Prathik
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > >  <mk...@griddynamics.com>
> > >
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Upayavira <uv...@odoko.co.uk>.
Can you se the ShingleFilterFactory? It is ngrams for terms rather than
characters. If you limited it to two term ngrams, when the user presses
space after their first word, you could do a suggested query against
your two term ngram field, which would suggest Jason Bourne, Jason
Statham, etc then you press space after "Jason".

Upayavira

On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> My use case is I want to search for any substring of the indexed string
> and
> the Suggester should suggest the indexed string. What can I do to make
> this
> work?
> 
> Thanks,
> Prathik
> 
> 
> On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> <mkhludnev@griddynamics.com
> > wrote:
> 
> > Please excuse my misunderstanding, but I always wonder why this index time
> > processing is suggested usually. from my POV is the case for query-time
> > processing i.e. PrefixQuery aka wildcard query Jason* .
> > Ultra-fast term retrieval also provided by TermsComponent.
> >
> >
> > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <jack@basetechnology.com
> > >wrote:
> >
> > > ngrams?
> > >
> > > See:
> > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> > http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Prathik Puthran
> > > Sent: Wednesday, June 05, 2013 11:59 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Configuring lucene to suggest the indexed string for all the
> > > searches of the substring of the indexed string
> > >
> > >
> > > Hi,
> > >
> > > Is it possible to configure solr to suggest the indexed string for all
> > the
> > > searches of the substring of the string?
> > >
> > > Thanks,
> > > Prathik
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
My use case is I want to search for any substring of the indexed string and
the Suggester should suggest the indexed string. What can I do to make this
work?

Thanks,
Prathik


On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev <mkhludnev@griddynamics.com
> wrote:

> Please excuse my misunderstanding, but I always wonder why this index time
> processing is suggested usually. from my POV is the case for query-time
> processing i.e. PrefixQuery aka wildcard query Jason* .
> Ultra-fast term retrieval also provided by TermsComponent.
>
>
> On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <jack@basetechnology.com
> >wrote:
>
> > ngrams?
> >
> > See:
> > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> >
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Prathik Puthran
> > Sent: Wednesday, June 05, 2013 11:59 AM
> > To: solr-user@lucene.apache.org
> > Subject: Configuring lucene to suggest the indexed string for all the
> > searches of the substring of the indexed string
> >
> >
> > Hi,
> >
> > Is it possible to configure solr to suggest the indexed string for all
> the
> > searches of the substring of the string?
> >
> > Thanks,
> > Prathik
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Please excuse my misunderstanding, but I always wonder why this index time
processing is suggested usually. from my POV is the case for query-time
processing i.e. PrefixQuery aka wildcard query Jason* .
Ultra-fast term retrieval also provided by TermsComponent.


On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> ngrams?
>
> See:
> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> apache/lucene/analysis/ngram/**NGramFilterFactory.html<http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Prathik Puthran
> Sent: Wednesday, June 05, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Configuring lucene to suggest the indexed string for all the
> searches of the substring of the indexed string
>
>
> Hi,
>
> Is it possible to configure solr to suggest the indexed string for all the
> searches of the substring of the string?
>
> Thanks,
> Prathik
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Prathik Puthran <pr...@gmail.com>.
ngrams won't work here. If I index all the ngrams of the string and when I
try to search for some string it would suggest all the ngrams as well.
Eg:
Dictionary contains the word "Jason Bourne" and you index all the ngrams of
the above word.
When I try to search for "Jason" solr suggests all the ngrams having the
word "Jason". Instead of just suggesting "Jason Bourne" lucene suggests
"Jason B", "Jason Bo", "Jason Bou", "Jason Bour", "Jason Bourn", "Jason
Bourne".

What should I do so that I only get "Jason Bourne" as the suggestion when
the uses searches any substring of this (Bour, Bourne etc).


On Wed, Jun 5, 2013 at 9:39 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> ngrams?
>
> See:
> http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> apache/lucene/analysis/ngram/**NGramFilterFactory.html<http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Prathik Puthran
> Sent: Wednesday, June 05, 2013 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Configuring lucene to suggest the indexed string for all the
> searches of the substring of the indexed string
>
>
> Hi,
>
> Is it possible to configure solr to suggest the indexed string for all the
> searches of the substring of the string?
>
> Thanks,
> Prathik
>

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Posted by Jack Krupansky <ja...@basetechnology.com>.
ngrams?

See:
http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html

-- Jack Krupansky

-----Original Message----- 
From: Prathik Puthran
Sent: Wednesday, June 05, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Configuring lucene to suggest the indexed string for all the 
searches of the substring of the indexed string

Hi,

Is it possible to configure solr to suggest the indexed string for all the
searches of the substring of the string?

Thanks,
Prathik