You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2022/12/16 17:50:49 UTC
Retconn'ing Solr index schema
All,
I'm trying to determine why a change was made to my internal project
some years ago. The commit comment is unhelpful, but this field type was
added and then we changed our "username" field in the Solr index to use
this field-type:
"add-field-type" : {
"name":"sortMe",
"class":"solr.TextField",
"analyzer":{
"tokenizer":{
"class":"solr.KeywordTokenizerFactory"
}
"filters":[{
"class":"solr.LowercaseFilterFactory"
}]
}
}
The "username" field contains (wait for it) the username for a user,
where each document in the index represents a user. We want to be able
to search for users given their usernames and also be able to sort based
upon the value.
I *think* the reason we changed this was because of the sorting. If you
have a username like "foo-bar-baz" then Solr will tokenize the value
into separate terms but we want to use the whole thing together as one
continuous string.
We want to do the same thing with email addresses, and we used this same
field-type for that purpose. For example, it's never useful to search
for "gmail" in email addresses because some huge percentage of users
come back. If you really want to search for all gmail users, we want you
to search for "*gmail*".
Will we likely achieve our goals with the field-type specified above?
Is there a reason to lowercase everything? Does that affect sorting? It
does not seem to affect searching.
Thanks,
-chris
Re: Retconn'ing Solr index schema
Posted by Alessandro Benedetti <a....@sease.io>.
Hi,
Apache Solr sorts by lexicographic order so uppercase/lowercase counts!
Cheers
On Sat, 17 Dec 2022, 02:51 Christopher Schultz, <
chris@christopherschultz.net> wrote:
> All,
>
> I'm trying to determine why a change was made to my internal project
> some years ago. The commit comment is unhelpful, but this field type was
> added and then we changed our "username" field in the Solr index to use
> this field-type:
>
> "add-field-type" : {
> "name":"sortMe",
> "class":"solr.TextField",
> "analyzer":{
> "tokenizer":{
> "class":"solr.KeywordTokenizerFactory"
> }
> "filters":[{
> "class":"solr.LowercaseFilterFactory"
> }]
> }
> }
>
> The "username" field contains (wait for it) the username for a user,
> where each document in the index represents a user. We want to be able
> to search for users given their usernames and also be able to sort based
> upon the value.
>
> I *think* the reason we changed this was because of the sorting. If you
> have a username like "foo-bar-baz" then Solr will tokenize the value
> into separate terms but we want to use the whole thing together as one
> continuous string.
>
> We want to do the same thing with email addresses, and we used this same
> field-type for that purpose. For example, it's never useful to search
> for "gmail" in email addresses because some huge percentage of users
> come back. If you really want to search for all gmail users, we want you
> to search for "*gmail*".
>
> Will we likely achieve our goals with the field-type specified above?
>
> Is there a reason to lowercase everything? Does that affect sorting? It
> does not seem to affect searching.
>
> Thanks,
> -chris
>