You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bhavnik Gajjar <bh...@gatewaynintec.com> on 2010/08/02 16:39:47 UTC

enhancing auto complete

Hi,

I'm looking for a solution related to auto complete feature for one 
application.

Below is a list of texts from which auto complete results would be 
populated.

Lorem ipsum dolor sit amet
tincidunt ut laoreet
dolore eu feugiat nulla facilisis at vero eros et
te feugait nulla facilisi
Claritas est etiam processus
anteposuerit litterarum formas humanitatis
fiant sollemnes in futurum
Hieyed ddi lorem ipsum dolor
test lorem ipsume
test xyz lorem ipslili

Consider below table. First column describes user entered value and 
second column describes expected result (list of auto complete terms 
that should be populated from Solr)

lorem
	*Lorem* ipsum dolor sit amet
Hieyed ddi *lorem* ipsum dolor
test *lorem *ipsume
test xyz *lorem *ipslili
lorem ip
	*Lorem ip*sum dolor sit amet
Hieyed ddi *lorem ip*sum dolor
test *lorem ip*sume
test xyz *lorem ip*slili
lorem ipsl
	test xyz *lorem ipsl*ili



Can anyone share ideas of how this can be achieved with Solr? Already 
tried with various tokenizers and filter factories like, 
WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory, 
ShingleFilterFactory etc. but no luck so far..

Note that, It would be excellent if terms populated from Solr can be 
highlighted by using Highlighting or any other component/mechanism of Solr.

*Note :* Standard autocomplete (like, 
facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered 
term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already 
working fine with the application. but, nowadays, looking for enhancing 
the existing auto complete stuff with the above requirement.

Any thoughts?

Thanks in advance




The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to dnd@gatewaynintec.com

Re: enhancing auto complete

Posted by sc...@asia.com.
Ok i'm still interested of the design
 

 


 

 

-----Original Message-----
From: Avlesh Singh <av...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 5:20 pm
Subject: Re: enhancing auto complete


Hahaha ... sorry its not. And there is no readymade code that I can give you

either. But yes, if you liked it, I can share the design of this feature

(solr, backend and frontend).



Cheers

Avlesh

@avlesh <http://twitter.com/avlesh> | http://webklipper.com



On Mon, Aug 2, 2010 at 8:47 PM, <sc...@asia.com> wrote:



>

>  Hi, I'm also interested of this feature... is it open source?

>

>

>

>

>

>

>

>

> -----Original Message-----

> From: Avlesh Singh <av...@gmail.com>

> To: solr-user@lucene.apache.org

> Sent: Mon, Aug 2, 2010 5:09 pm

> Subject: Re: enhancing auto complete

>

>

> From whatever I could read in your broken table of sample use cases, I

> think

>

> you are looking for something similar to what has been done here -

>

> http://askme.in; if this is what you are looking do let me know.

>

>

>

> Cheers

>

> Avlesh

>

> @avlesh <http://twitter.com/avlesh> | http://webklipper.com

>

>

>

> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar <

>

> bhavnik.gajjar@gatewaynintec.com> wrote:

>

>

>

> > Hi,

>

> >

>

> > I'm looking for a solution related to auto complete feature for one

>

> > application.

>

> >

>

> > Below is a list of texts from which auto complete results would be

>

> > populated.

>

> >

>

> > Lorem ipsum dolor sit amet

>

> > tincidunt ut laoreet

>

> > dolore eu feugiat nulla facilisis at vero eros et

>

> > te feugait nulla facilisi

>

> > Claritas est etiam processus

>

> > anteposuerit litterarum formas humanitatis

>

> > fiant sollemnes in futurum

>

> > Hieyed ddi lorem ipsum dolor

>

> > test lorem ipsume

>

> > test xyz lorem ipslili

>

> >

>

> > Consider below table. First column describes user entered value and

>

> > second column describes expected result (list of auto complete terms

>

> > that should be populated from Solr)

>

> >

>

> > lorem

>

> >        *Lorem* ipsum dolor sit amet

>

> > Hieyed ddi *lorem* ipsum dolor

>

> > test *lorem *ipsume

>

> > test xyz *lorem *ipslili

>

> > lorem ip

>

> >        *Lorem ip*sum dolor sit amet

>

> > Hieyed ddi *lorem ip*sum dolor

>

> > test *lorem ip*sume

>

> > test xyz *lorem ip*slili

>

> > lorem ipsl

>

> >        test xyz *lorem ipsl*ili

>

> >

>

> >

>

> >

>

> > Can anyone share ideas of how this can be achieved with Solr? Already

>

> > tried with various tokenizers and filter factories like,

>

> > WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,

>

> > ShingleFilterFactory etc. but no luck so far..

>

> >

>

> > Note that, It would be excellent if terms populated from Solr can be

>

> > highlighted by using Highlighting or any other component/mechanism of

> Solr.

>

> >

>

> > *Note :* Standard autocomplete (like,

>

> > facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered

>

> > term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already

>

> > working fine with the application. but, nowadays, looking for enhancing

>

> > the existing auto complete stuff with the above requirement.

>

> >

>

> > Any thoughts?

>

> >

>

> > Thanks in advance

>

> >

>

> >

>

> >

>

> >

>

> > The contents of this eMail including the contents of attachment(s) are

>

> > privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL)

> and

>

> > should not be disclosed to, used by or copied in any manner by anyone

> other

>

> > than the intended addressee(s). If this eMail has been received by error,

>

> > please advise the sender immediately and delete it from your system. The

>

> > views expressed in this eMail message are those of the individual sender,

>

> > except where the sender expressly, and with authority, states them to be

> the

>

> > views of GNPL. Any unauthorized review, use, disclosure, dissemination,

>

> > forwarding, printing or copying of this eMail or any action taken in

>

> > reliance on this eMail is strictly prohibited and may be unlawful. This

>

> > eMail may contain viruses. GNPL has taken every reasonable precaution to

>

> > minimize this risk, but is not liable for any damage you may sustain as a

>

> > result of any virus in this eMail. You should carry out your own virus

>

> > checks before opening the eMail or attachment(s). GNPL is neither liable

> for

>

> > the proper and complete transmission of the information contained in this

>

> > communication nor for any delay in its receipt. GNPL reserves the right

> to

>

> > monitor and review the content of all messages sent to or from this eMail

>

> > address and may be stored on the GNPL eMail system. In case this eMail

> has

>

> > reached you in error, and you  would no longer like to receive eMails

> from

>

> > us, then please send an eMail to dnd@gatewaynintec.com

>

> >

>

>

>

>


 

Re: enhancing auto complete

Posted by Avlesh Singh <av...@gmail.com>.
Hahaha ... sorry its not. And there is no readymade code that I can give you
either. But yes, if you liked it, I can share the design of this feature
(solr, backend and frontend).

Cheers
Avlesh
@avlesh <http://twitter.com/avlesh> | http://webklipper.com

On Mon, Aug 2, 2010 at 8:47 PM, <sc...@asia.com> wrote:

>
>  Hi, I'm also interested of this feature... is it open source?
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Avlesh Singh <av...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, Aug 2, 2010 5:09 pm
> Subject: Re: enhancing auto complete
>
>
> From whatever I could read in your broken table of sample use cases, I
> think
>
> you are looking for something similar to what has been done here -
>
> http://askme.in; if this is what you are looking do let me know.
>
>
>
> Cheers
>
> Avlesh
>
> @avlesh <http://twitter.com/avlesh> | http://webklipper.com
>
>
>
> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar <
>
> bhavnik.gajjar@gatewaynintec.com> wrote:
>
>
>
> > Hi,
>
> >
>
> > I'm looking for a solution related to auto complete feature for one
>
> > application.
>
> >
>
> > Below is a list of texts from which auto complete results would be
>
> > populated.
>
> >
>
> > Lorem ipsum dolor sit amet
>
> > tincidunt ut laoreet
>
> > dolore eu feugiat nulla facilisis at vero eros et
>
> > te feugait nulla facilisi
>
> > Claritas est etiam processus
>
> > anteposuerit litterarum formas humanitatis
>
> > fiant sollemnes in futurum
>
> > Hieyed ddi lorem ipsum dolor
>
> > test lorem ipsume
>
> > test xyz lorem ipslili
>
> >
>
> > Consider below table. First column describes user entered value and
>
> > second column describes expected result (list of auto complete terms
>
> > that should be populated from Solr)
>
> >
>
> > lorem
>
> >        *Lorem* ipsum dolor sit amet
>
> > Hieyed ddi *lorem* ipsum dolor
>
> > test *lorem *ipsume
>
> > test xyz *lorem *ipslili
>
> > lorem ip
>
> >        *Lorem ip*sum dolor sit amet
>
> > Hieyed ddi *lorem ip*sum dolor
>
> > test *lorem ip*sume
>
> > test xyz *lorem ip*slili
>
> > lorem ipsl
>
> >        test xyz *lorem ipsl*ili
>
> >
>
> >
>
> >
>
> > Can anyone share ideas of how this can be achieved with Solr? Already
>
> > tried with various tokenizers and filter factories like,
>
> > WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
>
> > ShingleFilterFactory etc. but no luck so far..
>
> >
>
> > Note that, It would be excellent if terms populated from Solr can be
>
> > highlighted by using Highlighting or any other component/mechanism of
> Solr.
>
> >
>
> > *Note :* Standard autocomplete (like,
>
> > facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
>
> > term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
>
> > working fine with the application. but, nowadays, looking for enhancing
>
> > the existing auto complete stuff with the above requirement.
>
> >
>
> > Any thoughts?
>
> >
>
> > Thanks in advance
>
> >
>
> >
>
> >
>
> >
>
> > The contents of this eMail including the contents of attachment(s) are
>
> > privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL)
> and
>
> > should not be disclosed to, used by or copied in any manner by anyone
> other
>
> > than the intended addressee(s). If this eMail has been received by error,
>
> > please advise the sender immediately and delete it from your system. The
>
> > views expressed in this eMail message are those of the individual sender,
>
> > except where the sender expressly, and with authority, states them to be
> the
>
> > views of GNPL. Any unauthorized review, use, disclosure, dissemination,
>
> > forwarding, printing or copying of this eMail or any action taken in
>
> > reliance on this eMail is strictly prohibited and may be unlawful. This
>
> > eMail may contain viruses. GNPL has taken every reasonable precaution to
>
> > minimize this risk, but is not liable for any damage you may sustain as a
>
> > result of any virus in this eMail. You should carry out your own virus
>
> > checks before opening the eMail or attachment(s). GNPL is neither liable
> for
>
> > the proper and complete transmission of the information contained in this
>
> > communication nor for any delay in its receipt. GNPL reserves the right
> to
>
> > monitor and review the content of all messages sent to or from this eMail
>
> > address and may be stored on the GNPL eMail system. In case this eMail
> has
>
> > reached you in error, and you  would no longer like to receive eMails
> from
>
> > us, then please send an eMail to dnd@gatewaynintec.com
>
> >
>
>
>
>

Re: enhancing auto complete

Posted by sc...@asia.com.
 Hi, I'm also interested of this feature... is it open source?

 


 

 

-----Original Message-----
From: Avlesh Singh <av...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Mon, Aug 2, 2010 5:09 pm
Subject: Re: enhancing auto complete


>From whatever I could read in your broken table of sample use cases, I think

you are looking for something similar to what has been done here -

http://askme.in; if this is what you are looking do let me know.



Cheers

Avlesh

@avlesh <http://twitter.com/avlesh> | http://webklipper.com



On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar <

bhavnik.gajjar@gatewaynintec.com> wrote:



> Hi,

>

> I'm looking for a solution related to auto complete feature for one

> application.

>

> Below is a list of texts from which auto complete results would be

> populated.

>

> Lorem ipsum dolor sit amet

> tincidunt ut laoreet

> dolore eu feugiat nulla facilisis at vero eros et

> te feugait nulla facilisi

> Claritas est etiam processus

> anteposuerit litterarum formas humanitatis

> fiant sollemnes in futurum

> Hieyed ddi lorem ipsum dolor

> test lorem ipsume

> test xyz lorem ipslili

>

> Consider below table. First column describes user entered value and

> second column describes expected result (list of auto complete terms

> that should be populated from Solr)

>

> lorem

>        *Lorem* ipsum dolor sit amet

> Hieyed ddi *lorem* ipsum dolor

> test *lorem *ipsume

> test xyz *lorem *ipslili

> lorem ip

>        *Lorem ip*sum dolor sit amet

> Hieyed ddi *lorem ip*sum dolor

> test *lorem ip*sume

> test xyz *lorem ip*slili

> lorem ipsl

>        test xyz *lorem ipsl*ili

>

>

>

> Can anyone share ideas of how this can be achieved with Solr? Already

> tried with various tokenizers and filter factories like,

> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,

> ShingleFilterFactory etc. but no luck so far..

>

> Note that, It would be excellent if terms populated from Solr can be

> highlighted by using Highlighting or any other component/mechanism of Solr.

>

> *Note :* Standard autocomplete (like,

> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered

> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already

> working fine with the application. but, nowadays, looking for enhancing

> the existing auto complete stuff with the above requirement.

>

> Any thoughts?

>

> Thanks in advance

>

>

>

>

> The contents of this eMail including the contents of attachment(s) are

> privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and

> should not be disclosed to, used by or copied in any manner by anyone other

> than the intended addressee(s). If this eMail has been received by error,

> please advise the sender immediately and delete it from your system. The

> views expressed in this eMail message are those of the individual sender,

> except where the sender expressly, and with authority, states them to be the

> views of GNPL. Any unauthorized review, use, disclosure, dissemination,

> forwarding, printing or copying of this eMail or any action taken in

> reliance on this eMail is strictly prohibited and may be unlawful. This

> eMail may contain viruses. GNPL has taken every reasonable precaution to

> minimize this risk, but is not liable for any damage you may sustain as a

> result of any virus in this eMail. You should carry out your own virus

> checks before opening the eMail or attachment(s). GNPL is neither liable for

> the proper and complete transmission of the information contained in this

> communication nor for any delay in its receipt. GNPL reserves the right to

> monitor and review the content of all messages sent to or from this eMail

> address and may be stored on the GNPL eMail system. In case this eMail has

> reached you in error, and you  would no longer like to receive eMails from

> us, then please send an eMail to dnd@gatewaynintec.com

>


 

Re: enhancing auto complete

Posted by Bhavnik Gajjar <bh...@gatewaynintec.com>.
Thanks Avlesh for sharing the info. Will try it!

In between, some another solution is also found 
http://metaoptimize.com/qa/questions/17/stemming-problems-when-writing-search-auto-complete

Kind regards.

On 8/4/2010 9:13 PM, Avlesh Singh wrote:
> I preferred to answer this question privately earlier. But I have received
> innumerable requests to unveil the architecture. For the benefit of all, I
> am posting it here (after hiding as much info as I should, in my company's
> interest).
>
> The context: Auto-suggest feature on http://askme.in
>
> *Solr setup*: Underneath are some of the salient features -
>
>     1. TermsComponent is NOT used.
>     2. The index is made up of 4 fields of the following types -
>     "autocomplete_full", "autocomplete_token", "string" and "text".
>     3. "autocomplete_full" uses KeywordTokenizerFactory and
>     EdgeNGramFilterFactory. "autocomplete_token" uses WhitespaceTokenizerFactory
>     and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
>     filters like LowerCaseFilterFactory etc applied during querying and
>     indexing.
>     4. Standard DataImportHandler and a bunch of sql procedures are used to
>     "derive" all suggestable phrases from the system and index them in the above
>     mentioned fields.
>
> *Controller setup*: The controller (to handle suggest queries) is a typical
> JAVA servlet using Solr as its backend (connecting via solrj). Based on the
> incoming query string, a lucene query is created. It is BooleanQuery
> comprising of TermQuery across all the above mentioned fields. The boost
> factor to each of these term queries would determine (to an extent) what
> kind of matches do you prefer to show up first. JSON is used as the data
> exchange format.
>
> *Frontend setup*: It is a home grown JS to address some specific use cases
> of the project in question. One simple exercise with Firebug will spill all
> the beans. However, I strongly recommend using jQuery to build (and extend)
> the UI component.
>
> Any help beyond this is available, but off the list.
>
> Cheers
> Avlesh
> @avlesh<http://twitter.com/avlesh>  | http://webklipper.com
>
> On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar<
> bhavnik.gajjar@gatewaynintec.com>  wrote:
>
>    
>>   Whoops!
>>
>> table still not looks ok :(
>>
>> trying to send once again
>>
>>
>> lorem            Lorem ipsum dolor sit amet
>>                      Hieyed ddi lorem ipsum dolor
>>                      test lorem ipsume
>>                      test xyz lorem ipslili
>>
>> lorem ip        Lorem ipsum dolor sit amet
>>                      Hieyed ddi lorem ipsum dolor
>>                      test lorem ipsume
>>                      test xyz lorem ipslili
>>
>> lorem ipsl    test xyz lorem ipslili
>>
>> On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
>>
>> Avlesh,
>>
>> Thanks for responding
>>
>> The table mentioned below looks like,
>>
>> lorem                                       Lorem ipsum dolor sit amet
>>                                                   Hieyed ddi lorem ipsum
>> dolor
>>                                                   test lorem ipsume
>>                                                   test xyz lorem ipslili
>>
>> lorem ip                                   Lorem ipsum dolor sit amet
>>                                                   Hieyed ddi lorem ipsum
>> dolor
>>                                                   test lorem ipsume
>>                                                   test xyz lorem ipslili
>>
>> lorem ipsl                                 test xyz lorem ipslili
>>
>>
>> Yes, [http://askme.in] looks good!
>>
>> I would like to know its designs/solr configurations etc.. Can you
>> please provide me detailed views of it?
>>
>> In [http://askme.in], there is one thing to be noted. Search text like,
>> [business c] populates [Business Centre] which looks OK but, [Consultant
>> Business] looks bit odd. But, in general the pointer you suggested is
>> great to start with.
>>
>> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>>
>>
>>   From whatever I could read in your broken table of sample use cases, I think
>>
>>
>>   you are looking for something similar to what has been done here -http://askme.in; if this is what you are looking do let me know.
>>
>> Cheers
>> Avlesh
>> @avlesh<http://twitter.com/avlesh>  <http://twitter.com/avlesh>   | http://webklipper.com
>>
>> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar<bh...@gatewaynintec.com>   wrote:
>>
>>
>>
>>
>>   Hi,
>>
>> I'm looking for a solution related to auto complete feature for one
>> application.
>>
>> Below is a list of texts from which auto complete results would be
>> populated.
>>
>> Lorem ipsum dolor sit amet
>> tincidunt ut laoreet
>> dolore eu feugiat nulla facilisis at vero eros et
>> te feugait nulla facilisi
>> Claritas est etiam processus
>> anteposuerit litterarum formas humanitatis
>> fiant sollemnes in futurum
>> Hieyed ddi lorem ipsum dolor
>> test lorem ipsume
>> test xyz lorem ipslili
>>
>> Consider below table. First column describes user entered value and
>> second column describes expected result (list of auto complete terms
>> that should be populated from Solr)
>>
>> lorem
>>          *Lorem* ipsum dolor sit amet
>> Hieyed ddi *lorem* ipsum dolor
>> test *lorem *ipsume
>> test xyz *lorem *ipslili
>> lorem ip
>>          *Lorem ip*sum dolor sit amet
>> Hieyed ddi *lorem ip*sum dolor
>> test *lorem ip*sume
>> test xyz *lorem ip*slili
>> lorem ipsl
>>          test xyz *lorem ipsl*ili
>>
>>
>>
>> Can anyone share ideas of how this can be achieved with Solr? Already
>> tried with various tokenizers and filter factories like,
>> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
>> ShingleFilterFactory etc. but no luck so far..
>>
>> Note that, It would be excellent if terms populated from Solr can be
>> highlighted by using Highlighting or any other component/mechanism of Solr.
>>
>> *Note :* Standard autocomplete (like,
>> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
>> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
>> working fine with the application. but, nowadays, looking for enhancing
>> the existing auto complete stuff with the above requirement.
>>
>> Any thoughts?
>>
>> Thanks in advance
>>
>>
>>
>>
>>      
>    


-- 
Regards,
*Bhavnik Gajjar*
www.gatewaynintec.com <http://www.gatewaynintec.com>

*Mobile:* +91-9998436253 *Phone: *+91 79 2685 2554 / 5 / 6
*MSN: *bhavnik.gajjar@gatewaynintec.com



The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to dnd@gatewaynintec.com

Re: enhancing auto complete

Posted by Avlesh Singh <av...@gmail.com>.
I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   "autocomplete_full", "autocomplete_token", "string" and "text".
   3. "autocomplete_full" uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. "autocomplete_token" uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   "derive" all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh <http://twitter.com/avlesh> | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar <
bhavnik.gajjar@gatewaynintec.com> wrote:

>  Whoops!
>
> table still not looks ok :(
>
> trying to send once again
>
>
> lorem            Lorem ipsum dolor sit amet
>                     Hieyed ddi lorem ipsum dolor
>                     test lorem ipsume
>                     test xyz lorem ipslili
>
> lorem ip        Lorem ipsum dolor sit amet
>                     Hieyed ddi lorem ipsum dolor
>                     test lorem ipsume
>                     test xyz lorem ipslili
>
> lorem ipsl    test xyz lorem ipslili
>
> On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
>
> Avlesh,
>
> Thanks for responding
>
> The table mentioned below looks like,
>
> lorem                                       Lorem ipsum dolor sit amet
>                                                  Hieyed ddi lorem ipsum
> dolor
>                                                  test lorem ipsume
>                                                  test xyz lorem ipslili
>
> lorem ip                                   Lorem ipsum dolor sit amet
>                                                  Hieyed ddi lorem ipsum
> dolor
>                                                  test lorem ipsume
>                                                  test xyz lorem ipslili
>
> lorem ipsl                                 test xyz lorem ipslili
>
>
> Yes, [http://askme.in] looks good!
>
> I would like to know its designs/solr configurations etc.. Can you
> please provide me detailed views of it?
>
> In [http://askme.in], there is one thing to be noted. Search text like,
> [business c] populates [Business Centre] which looks OK but, [Consultant
> Business] looks bit odd. But, in general the pointer you suggested is
> great to start with.
>
> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>
>
>  From whatever I could read in your broken table of sample use cases, I think
>
>
>  you are looking for something similar to what has been done here -http://askme.in; if this is what you are looking do let me know.
>
> Cheers
> Avlesh
> @avlesh<http://twitter.com/avlesh> <http://twitter.com/avlesh>  | http://webklipper.com
>
> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar<bh...@gatewaynintec.com>  wrote:
>
>
>
>
>  Hi,
>
> I'm looking for a solution related to auto complete feature for one
> application.
>
> Below is a list of texts from which auto complete results would be
> populated.
>
> Lorem ipsum dolor sit amet
> tincidunt ut laoreet
> dolore eu feugiat nulla facilisis at vero eros et
> te feugait nulla facilisi
> Claritas est etiam processus
> anteposuerit litterarum formas humanitatis
> fiant sollemnes in futurum
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> Consider below table. First column describes user entered value and
> second column describes expected result (list of auto complete terms
> that should be populated from Solr)
>
> lorem
>         *Lorem* ipsum dolor sit amet
> Hieyed ddi *lorem* ipsum dolor
> test *lorem *ipsume
> test xyz *lorem *ipslili
> lorem ip
>         *Lorem ip*sum dolor sit amet
> Hieyed ddi *lorem ip*sum dolor
> test *lorem ip*sume
> test xyz *lorem ip*slili
> lorem ipsl
>         test xyz *lorem ipsl*ili
>
>
>
> Can anyone share ideas of how this can be achieved with Solr? Already
> tried with various tokenizers and filter factories like,
> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
> ShingleFilterFactory etc. but no luck so far..
>
> Note that, It would be excellent if terms populated from Solr can be
> highlighted by using Highlighting or any other component/mechanism of Solr.
>
> *Note :* Standard autocomplete (like,
> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
> working fine with the application. but, nowadays, looking for enhancing
> the existing auto complete stuff with the above requirement.
>
> Any thoughts?
>
> Thanks in advance
>
>
>
>
>
>
>
>
> ------------------------------
>
> The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to
> dnd@gatewaynintec.com
>

Re: enhancing auto complete

Posted by Bhavnik Gajjar <bh...@gatewaynintec.com>.
Whoops!

table still not looks ok :(

trying to send once again

lorem            Lorem ipsum dolor sit amet
                     Hieyed ddi lorem ipsum dolor
                     test lorem ipsume
                     test xyz lorem ipslili

lorem ip        Lorem ipsum dolor sit amet
                     Hieyed ddi lorem ipsum dolor
                     test lorem ipsume
                     test xyz lorem ipslili

lorem ipsl    test xyz lorem ipslili

On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
> Avlesh,
>
> Thanks for responding
>
> The table mentioned below looks like,
>
> lorem                                       Lorem ipsum dolor sit amet
>                                                   Hieyed ddi lorem ipsum
> dolor
>                                                   test lorem ipsume
>                                                   test xyz lorem ipslili
>
> lorem ip                                   Lorem ipsum dolor sit amet
>                                                   Hieyed ddi lorem ipsum
> dolor
>                                                   test lorem ipsume
>                                                   test xyz lorem ipslili
>
> lorem ipsl                                 test xyz lorem ipslili
>
>
> Yes, [http://askme.in] looks good!
>
> I would like to know its designs/solr configurations etc.. Can you
> please provide me detailed views of it?
>
> In [http://askme.in], there is one thing to be noted. Search text like,
> [business c] populates [Business Centre] which looks OK but, [Consultant
> Business] looks bit odd. But, in general the pointer you suggested is
> great to start with.
>
> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>    
>>>  From whatever I could read in your broken table of sample use cases, I think
>>>        
>> you are looking for something similar to what has been done here -
>> http://askme.in; if this is what you are looking do let me know.
>>
>> Cheers
>> Avlesh
>> @avlesh<http://twitter.com/avlesh>   | http://webklipper.com
>>
>> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar<
>> bhavnik.gajjar@gatewaynintec.com>   wrote:
>>
>>
>>      
>>> Hi,
>>>
>>> I'm looking for a solution related to auto complete feature for one
>>> application.
>>>
>>> Below is a list of texts from which auto complete results would be
>>> populated.
>>>
>>> Lorem ipsum dolor sit amet
>>> tincidunt ut laoreet
>>> dolore eu feugiat nulla facilisis at vero eros et
>>> te feugait nulla facilisi
>>> Claritas est etiam processus
>>> anteposuerit litterarum formas humanitatis
>>> fiant sollemnes in futurum
>>> Hieyed ddi lorem ipsum dolor
>>> test lorem ipsume
>>> test xyz lorem ipslili
>>>
>>> Consider below table. First column describes user entered value and
>>> second column describes expected result (list of auto complete terms
>>> that should be populated from Solr)
>>>
>>> lorem
>>>          *Lorem* ipsum dolor sit amet
>>> Hieyed ddi *lorem* ipsum dolor
>>> test *lorem *ipsume
>>> test xyz *lorem *ipslili
>>> lorem ip
>>>          *Lorem ip*sum dolor sit amet
>>> Hieyed ddi *lorem ip*sum dolor
>>> test *lorem ip*sume
>>> test xyz *lorem ip*slili
>>> lorem ipsl
>>>          test xyz *lorem ipsl*ili
>>>
>>>
>>>
>>> Can anyone share ideas of how this can be achieved with Solr? Already
>>> tried with various tokenizers and filter factories like,
>>> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
>>> ShingleFilterFactory etc. but no luck so far..
>>>
>>> Note that, It would be excellent if terms populated from Solr can be
>>> highlighted by using Highlighting or any other component/mechanism of Solr.
>>>
>>> *Note :* Standard autocomplete (like,
>>> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
>>> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
>>> working fine with the application. but, nowadays, looking for enhancing
>>> the existing auto complete stuff with the above requirement.
>>>
>>> Any thoughts?
>>>
>>> Thanks in advance
>>>
>>>        
>>
>



The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to dnd@gatewaynintec.com

Re: enhancing auto complete

Posted by Bhavnik Gajjar <bh...@gatewaynintec.com>.
Avlesh,

Thanks for responding

The table mentioned below looks like,

lorem                                       Lorem ipsum dolor sit amet
                                                 Hieyed ddi lorem ipsum 
dolor
                                                 test lorem ipsume
                                                 test xyz lorem ipslili

lorem ip                                   Lorem ipsum dolor sit amet
                                                 Hieyed ddi lorem ipsum 
dolor
                                                 test lorem ipsume
                                                 test xyz lorem ipslili

lorem ipsl                                 test xyz lorem ipslili


Yes, [http://askme.in] looks good!

I would like to know its designs/solr configurations etc.. Can you 
please provide me detailed views of it?

In [http://askme.in], there is one thing to be noted. Search text like, 
[business c] populates [Business Centre] which looks OK but, [Consultant 
Business] looks bit odd. But, in general the pointer you suggested is 
great to start with.

On 8/2/2010 8:39 PM, Avlesh Singh wrote:
> > From whatever I could read in your broken table of sample use cases, I think
> you are looking for something similar to what has been done here -
> http://askme.in; if this is what you are looking do let me know.
>
> Cheers
> Avlesh
> @avlesh<http://twitter.com/avlesh>  | http://webklipper.com
>
> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar<
> bhavnik.gajjar@gatewaynintec.com>  wrote:
>
>    
>> Hi,
>>
>> I'm looking for a solution related to auto complete feature for one
>> application.
>>
>> Below is a list of texts from which auto complete results would be
>> populated.
>>
>> Lorem ipsum dolor sit amet
>> tincidunt ut laoreet
>> dolore eu feugiat nulla facilisis at vero eros et
>> te feugait nulla facilisi
>> Claritas est etiam processus
>> anteposuerit litterarum formas humanitatis
>> fiant sollemnes in futurum
>> Hieyed ddi lorem ipsum dolor
>> test lorem ipsume
>> test xyz lorem ipslili
>>
>> Consider below table. First column describes user entered value and
>> second column describes expected result (list of auto complete terms
>> that should be populated from Solr)
>>
>> lorem
>>         *Lorem* ipsum dolor sit amet
>> Hieyed ddi *lorem* ipsum dolor
>> test *lorem *ipsume
>> test xyz *lorem *ipslili
>> lorem ip
>>         *Lorem ip*sum dolor sit amet
>> Hieyed ddi *lorem ip*sum dolor
>> test *lorem ip*sume
>> test xyz *lorem ip*slili
>> lorem ipsl
>>         test xyz *lorem ipsl*ili
>>
>>
>>
>> Can anyone share ideas of how this can be achieved with Solr? Already
>> tried with various tokenizers and filter factories like,
>> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
>> ShingleFilterFactory etc. but no luck so far..
>>
>> Note that, It would be excellent if terms populated from Solr can be
>> highlighted by using Highlighting or any other component/mechanism of Solr.
>>
>> *Note :* Standard autocomplete (like,
>> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
>> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
>> working fine with the application. but, nowadays, looking for enhancing
>> the existing auto complete stuff with the above requirement.
>>
>> Any thoughts?
>>
>> Thanks in advance
>>      
>    



The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to dnd@gatewaynintec.com

Re: enhancing auto complete

Posted by Avlesh Singh <av...@gmail.com>.
>From whatever I could read in your broken table of sample use cases, I think
you are looking for something similar to what has been done here -
http://askme.in; if this is what you are looking do let me know.

Cheers
Avlesh
@avlesh <http://twitter.com/avlesh> | http://webklipper.com

On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjar <
bhavnik.gajjar@gatewaynintec.com> wrote:

> Hi,
>
> I'm looking for a solution related to auto complete feature for one
> application.
>
> Below is a list of texts from which auto complete results would be
> populated.
>
> Lorem ipsum dolor sit amet
> tincidunt ut laoreet
> dolore eu feugiat nulla facilisis at vero eros et
> te feugait nulla facilisi
> Claritas est etiam processus
> anteposuerit litterarum formas humanitatis
> fiant sollemnes in futurum
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> Consider below table. First column describes user entered value and
> second column describes expected result (list of auto complete terms
> that should be populated from Solr)
>
> lorem
>        *Lorem* ipsum dolor sit amet
> Hieyed ddi *lorem* ipsum dolor
> test *lorem *ipsume
> test xyz *lorem *ipslili
> lorem ip
>        *Lorem ip*sum dolor sit amet
> Hieyed ddi *lorem ip*sum dolor
> test *lorem ip*sume
> test xyz *lorem ip*slili
> lorem ipsl
>        test xyz *lorem ipsl*ili
>
>
>
> Can anyone share ideas of how this can be achieved with Solr? Already
> tried with various tokenizers and filter factories like,
> WhiteSpaceTokenizer, KeywordTokenizer, EdgeNGramFilterFactory,
> ShingleFilterFactory etc. but no luck so far..
>
> Note that, It would be excellent if terms populated from Solr can be
> highlighted by using Highlighting or any other component/mechanism of Solr.
>
> *Note :* Standard autocomplete (like,
> facet.field=AutoComplete&f.AutoComplete.facet.prefix=<user entered
> term>&f.AutoComplete.facet.limit=10&facet.sort&rows=0) are already
> working fine with the application. but, nowadays, looking for enhancing
> the existing auto complete stuff with the above requirement.
>
> Any thoughts?
>
> Thanks in advance
>
>
>
>
> The contents of this eMail including the contents of attachment(s) are
> privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and
> should not be disclosed to, used by or copied in any manner by anyone other
> than the intended addressee(s). If this eMail has been received by error,
> please advise the sender immediately and delete it from your system. The
> views expressed in this eMail message are those of the individual sender,
> except where the sender expressly, and with authority, states them to be the
> views of GNPL. Any unauthorized review, use, disclosure, dissemination,
> forwarding, printing or copying of this eMail or any action taken in
> reliance on this eMail is strictly prohibited and may be unlawful. This
> eMail may contain viruses. GNPL has taken every reasonable precaution to
> minimize this risk, but is not liable for any damage you may sustain as a
> result of any virus in this eMail. You should carry out your own virus
> checks before opening the eMail or attachment(s). GNPL is neither liable for
> the proper and complete transmission of the information contained in this
> communication nor for any delay in its receipt. GNPL reserves the right to
> monitor and review the content of all messages sent to or from this eMail
> address and may be stored on the GNPL eMail system. In case this eMail has
> reached you in error, and you  would no longer like to receive eMails from
> us, then please send an eMail to dnd@gatewaynintec.com
>