You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ian Holsman <li...@holsman.net> on 2009/06/19 09:20:52 UTC

Auto suggest.. how to do mixed case

hi guys.

I've noticed that one of the new features in Solr 1.4 is the Termscomponent
which enables the Autosuggest.

but what puzzles me is how to actually use it in an application.

most autosuggests are case insensitive, so there is no difference if I type
in 'San Francisco' or 'san francisco'.

now I've tried with a 'text' field, and a 'string' field with no joy. with
String providing the best result, but still with case sensitivity.

at the moment I'm using a custom field type

    <fieldType name="string_lc" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
      <analyzer>
        <!-- KeywordTokenizer does no actual tokenizing, so the entire
             input string is preserved as a single token
          -->
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <!-- The LowerCase TokenFilter does what you expect, which can be
             when you want your sorting to be case insensitive
          -->
        <filter class="solr.LowerCaseFilterFactory" />


      </analyzer>
    </fieldType>

which converts all the field to all lower case, which allows me to submit
the query as lower case and better good results.

so the point of the email is to find out how do I get the autosuggest to
return mixed case results, and not require me to lower case the query before
I send it?

Re: Auto suggest.. how to do mixed case

Posted by Mani Kumar <ma...@gmail.com>.
hi shalin,
can you please share code or tutorial documents for (it'll be great help)

  1. Prefix search on shingles
  2. Exact (phrase) search on n-grams

The regular prefix search also works. The good thing with these is that you
can filter and different stored value is also possible.

??


thanks!
mani

On Mon, Jun 22, 2009 at 4:41 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner <in...@typo3.org> wrote:
>
> >
> > Hi Shalin,
> >
> >  I think
> >> that by naming it as /autoSuggest, a lot of users have been misled since
> >> there are other techniques available.
> >>
> >
> > what would you suggest?
> >
> >
> There are many techniques. Personally, I've used
>
>   1. Prefix search on shingles
>   2. Exact (phrase) search on n-grams
>
> The regular prefix search also works. The good thing with these is that you
> can filter and different stored value is also possible.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Auto suggest...

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Jun 22, 2009 at 4:55 PM, Paul Libbrecht <pa...@activemath.org> wrote:

> I'm not sure I'm understanding fully this thread,
>
> on the one hand it speaks about tuning the appropriate analyzer to get
> mixed case matching...
> This part I am not addressing and I zapped that part of the suject.
>
> on the other hand it seems to speak about an auto-suggestion facility?
> Is this http://wiki.apache.org/solr/SolrJS ?


No. In the past the TermsComponent was defined in the example schema.xml as
/autoSuggest which seems to suggest that it is *the* way to get auto-suggest
support in Solr. This is what I was referring to which I said that users may
have been misled by this.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Auto suggest...

Posted by Paul Libbrecht <pa...@activemath.org>.
I'm not sure I'm understanding fully this thread,

on the one hand it speaks about tuning the appropriate analyzer to get  
mixed case matching...
This part I am not addressing and I zapped that part of the suject.

on the other hand it seems to speak about an auto-suggestion facility?
Is this http://wiki.apache.org/solr/SolrJS ?
That page doesn't describe much of the server interface (e.g. the  
field types, the type of queries, how to fuzzify them).

Are there other such plans in Solr?
If that maybe be useful we have such an auto-completion with GWT under  
APL at http://i2geo.net/ where we intend to move to solr soon.

paul


Le 22-juin-09 à 13:11, Shalin Shekhar Mangar a écrit :

> On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner <in...@typo3.org> wrote:
>
>>
>> Hi Shalin,
>>
>> I think
>>> that by naming it as /autoSuggest, a lot of users have been misled  
>>> since
>>> there are other techniques available.
>>>
>>
>> what would you suggest?
>>
>>
> There are many techniques. Personally, I've used
>
>   1. Prefix search on shingles
>   2. Exact (phrase) search on n-grams
>
> The regular prefix search also works. The good thing with these is  
> that you
> can filter and different stored value is also possible.
>
> -- 
> Regards,
> Shalin Shekhar Mangar.


Re: Auto suggest.. how to do mixed case

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner <in...@typo3.org> wrote:

>
> Hi Shalin,
>
>  I think
>> that by naming it as /autoSuggest, a lot of users have been misled since
>> there are other techniques available.
>>
>
> what would you suggest?
>
>
There are many techniques. Personally, I've used

   1. Prefix search on shingles
   2. Exact (phrase) search on n-grams

The regular prefix search also works. The good thing with these is that you
can filter and different stored value is also possible.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Auto suggest.. how to do mixed case

Posted by Ingo Renner <in...@typo3.org>.
Am 22.06.2009 um 11:09 schrieb Shalin Shekhar Mangar:

Hi Shalin,

> I think
> that by naming it as /autoSuggest, a lot of users have been misled  
> since
> there are other techniques available.

what would you suggest?


Ingo

-- 
Ingo Renner
TYPO3 Core Developer, Release Manager TYPO3 4.2




Re: Auto suggest.. how to do mixed case

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Jun 19, 2009 at 12:50 PM, Ian Holsman <li...@holsman.net> wrote:

> I've noticed that one of the new features in Solr 1.4 is the Termscomponent
> which enables the Autosuggest.
>

TermsComponent *can* be used for autosuggest though I don't think that was
the original motivation. In the end it just the same thing as a prefix but
returns the indexed tokens only rather than the stored field values. I think
that by naming it as /autoSuggest, a lot of users have been misled since
there are other techniques available.


>
> but what puzzles me is how to actually use it in an application.
>
> most autosuggests are case insensitive, so there is no difference if I type
> in 'San Francisco' or 'san francisco'.
>
> now I've tried with a 'text' field, and a 'string' field with no joy. with
> String providing the best result, but still with case sensitivity.
>
> at the moment I'm using a custom field type
>
>    <fieldType name="string_lc" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>      <analyzer>
>        <!-- KeywordTokenizer does no actual tokenizing, so the entire
>             input string is preserved as a single token
>          -->
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <!-- The LowerCase TokenFilter does what you expect, which can be
>             when you want your sorting to be case insensitive
>          -->
>        <filter class="solr.LowerCaseFilterFactory" />
>
>
>      </analyzer>
>    </fieldType>
>
> which converts all the field to all lower case, which allows me to submit
> the query as lower case and better good results.
>
> so the point of the email is to find out how do I get the autosuggest to
> return mixed case results, and not require me to lower case the query
> before
> I send it?
>

There is no way to do this right now using TermsComponent. You can index
lower case terms and store the mixed case terms. Then you can use a prefix
query which will return documents (and hence stored field values).

-- 
Regards,
Shalin Shekhar Mangar.