You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frankie <fr...@kleegroup.com> on 2011/06/28 11:14:55 UTC

Find results with or without whitespace

I'm looking for a way to index/search on terms that may or may not contain
spaces.
An example will explain better :
- Loooking for "healthcare", I want to find both "healthcare" and "health
care".
- Loooking for "health care", I want to find both "health care" and
"healthcare".

My other constraints are
- I will index rather long strings (extracted from Office documents)
- I want to avoid synonym lists (as they may be incomplete)
- I want to avoid specific logic (i.e. query rewriting with as many OR as
search terms combination requires)
- I don't want to rely on uppercase/lowercase tokenizer (as users are...
creative)

I already tried many tokenizer/filter combination without success.
I did not find any answer to this problem.


--
View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find results with or without whitespace

Posted by roySolr <ro...@gmail.com>.
Frankie, Have you fixes this issue? I'm interested in your solution,,

--
View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3298298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find results with or without whitespace

Posted by Frankie <fr...@kleegroup.com>.
Thank you for your answer.

I agree, I can manage predictable values through synonyms.

However most data in this index are company and product names, leading
sometimes to rather strange syntax (mix of upper/lower case, misplaced dash
or spaces). One purpose to using solr was to help in finding potential
duplicates before data insertion.

On another hand I could write a custom tokenizer/filter and a custom query
builder that would test many combinations. I have the feeling however it is
an inefficient approach.
That is...
Indexing : "chelsea soccer club" =>
"chelsea","soccer","club",chelseasoccer","soccerclub","chelseasoccerclub"
Searching : "chelsea soccerclub" => "chelsea" and "soccerclub" or
"chelseasoccerclub"
While search expressions are generally short, indexation will be a
nightmare...


--
View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find results with or without whitespace

Posted by roySolr <ro...@gmail.com>.
I had the same problem:

http://lucene.472066.n3.nabble.com/Results-with-and-without-whitespace-soccer-club-and-soccerclub-td2934742.html#a2964942



--
View this message in context: http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3117386.html
Sent from the Solr - User mailing list archive at Nabble.com.