You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Deepak Udapudi <DU...@delta.org> on 2018/01/30 22:48:40 UTC

full name free text search problem

Hi all,

I have the below scenario in full name search that we are trying to implement.

Solr configuration :-

fieldType name="keywords_text" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
      <filter class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
  </fieldType>


<field name="keywords" type="keywords_text" indexed="true" stored="false" multiValued="true" />
  <copyField source="fullName" dest="keywords" docValues="true" />
  <copyField source="officeName" dest="keywords" docValues="true" />
  <copyField source="facilityName" dest="keywords" docValues="true" />
</field>

Scenario :-

Solr configuration has office name, facility name and the full name as displayed above.
We are searching based on the input name with the records sorts by distance.

Problem :-

I am getting the records matching the full name sorted by distance.
If the input string(for ex Dae Kim) is provided, I am getting the records other than Dae Kim(for ex Rodney Kim) too at the top of the search results including Dae Kim
just before the next Dae Kim because Kim is matching with all the fields like full name, facility name and the office name. So, the hit frequency is high and it's
distance is less compared to the next Dae Kim in the search results with higher distance.

Expected results :-

I want to see all the records for Dae Kim to be seen at the top of the search results sorted by distance without any irrelevant results.

Queries :-

What is the fix for the above problem if anyone has faced it?
How do I handle the problem?

Any inputs would be highly appreciated.

Thanks in advance.

Regards,
Deepak




The information contained in this email message and any attachments is confidential and intended only for the addressee(s). If you are not an addressee, you may not copy or disclose the information, or act upon it, and you should delete it entirely from your email system. Please notify the sender that you received this email in error.

Re: full name free text search problem

Posted by Alessandro Benedetti <a....@sease.io>.
"I am getting the records matching the full name sorted by distance. 
If the input string(for ex Dae Kim) is provided, I am getting the records
other than Dae Kim(for ex Rodney Kim) too at the top of the search results
including Dae Kim 
just before the next Dae Kim because Kim is matching with all the fields
like full name, facility name and the office name. So, the hit frequency is
high and it's 
distance is less compared to the next Dae Kim in the search results with
higher distance. "

All is quite confused.
First of all, sorted by distance, do you mean sorted by string distance ?
By a space distance ?
You are analysing the fields without tokenization and then you put
everything in the same multivalued field.
This means you are going to have just exact matches.
And you lose the semantic of the field source ( which could have given a
different score boost depending on the field) .

If you want to sort or score by a string distance, you need to use function
query sorting or boosting[1]
In particular you are interested in strdist ( you find the details in the
page linked).
If it is geographical distance, take a look to the spatial module [2].

Regards

[1] https://lucene.apache.org/solr/guide/6_6/function-queries.html
[2] https://lucene.apache.org/solr/guide/6_6/spatial-search.html



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: full name free text search problem

Posted by Rahul Sood <ra...@gmail.com>.
Hi Deepak,
Look at the score of your response results.
You can do this in Debug mode.
Rahul.

On Wed, Jan 31, 2018 at 4:18 AM, Deepak Udapudi <DU...@delta.org> wrote:

> Hi all,
>
> I have the below scenario in full name search that we are trying to
> implement.
>
> Solr configuration :-
>
> fieldType name="keywords_text" class="solr.TextField">
>     <analyzer type="index">
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.PathHierarchyTokenizerFactory"
> delimiter="/"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> <field name="keywords" type="keywords_text" indexed="true" stored="false"
> multiValued="true" />
>   <copyField source="fullName" dest="keywords" docValues="true" />
>   <copyField source="officeName" dest="keywords" docValues="true" />
>   <copyField source="facilityName" dest="keywords" docValues="true" />
> </field>
>
> Scenario :-
>
> Solr configuration has office name, facility name and the full name as
> displayed above.
> We are searching based on the input name with the records sorts by
> distance.
>
> Problem :-
>
> I am getting the records matching the full name sorted by distance.
> If the input string(for ex Dae Kim) is provided, I am getting the records
> other than Dae Kim(for ex Rodney Kim) too at the top of the search results
> including Dae Kim
> just before the next Dae Kim because Kim is matching with all the fields
> like full name, facility name and the office name. So, the hit frequency is
> high and it's
> distance is less compared to the next Dae Kim in the search results with
> higher distance.
>
> Expected results :-
>
> I want to see all the records for Dae Kim to be seen at the top of the
> search results sorted by distance without any irrelevant results.
>
> Queries :-
>
> What is the fix for the above problem if anyone has faced it?
> How do I handle the problem?
>
> Any inputs would be highly appreciated.
>
> Thanks in advance.
>
> Regards,
> Deepak
>
>
>
>
> The information contained in this email message and any attachments is
> confidential and intended only for the addressee(s). If you are not an
> addressee, you may not copy or disclose the information, or act upon it,
> and you should delete it entirely from your email system. Please notify the
> sender that you received this email in error.
>



-- 
"Learning is not necessary, neither is survival"

Re: full name free text search problem

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You need to tokenize the full name in several different ways and then
search both (all) tokenization versions with different boosts.

This way you can tokenize as full string (perhaps lowercased) and then
also on white space and then maybe even with phonetic mapping to catch
spellings.

You can see something similar in:
https://gist.github.com/arafalov/5e04884e5aefaf46678c

Regards,
   Alex.

On 31 January 2018 at 05:48, Deepak Udapudi <DU...@delta.org> wrote:
> Hi all,
>
> I have the below scenario in full name search that we are trying to implement.
>
> Solr configuration :-
>
> fieldType name="keywords_text" class="solr.TextField">
>     <analyzer type="index">
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
>       <filter class="solr.LowerCaseTokenizerFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> <field name="keywords" type="keywords_text" indexed="true" stored="false" multiValued="true" />
>   <copyField source="fullName" dest="keywords" docValues="true" />
>   <copyField source="officeName" dest="keywords" docValues="true" />
>   <copyField source="facilityName" dest="keywords" docValues="true" />
> </field>
>
> Scenario :-
>
> Solr configuration has office name, facility name and the full name as displayed above.
> We are searching based on the input name with the records sorts by distance.
>
> Problem :-
>
> I am getting the records matching the full name sorted by distance.
> If the input string(for ex Dae Kim) is provided, I am getting the records other than Dae Kim(for ex Rodney Kim) too at the top of the search results including Dae Kim
> just before the next Dae Kim because Kim is matching with all the fields like full name, facility name and the office name. So, the hit frequency is high and it's
> distance is less compared to the next Dae Kim in the search results with higher distance.
>
> Expected results :-
>
> I want to see all the records for Dae Kim to be seen at the top of the search results sorted by distance without any irrelevant results.
>
> Queries :-
>
> What is the fix for the above problem if anyone has faced it?
> How do I handle the problem?
>
> Any inputs would be highly appreciated.
>
> Thanks in advance.
>
> Regards,
> Deepak
>
>
>
>
> The information contained in this email message and any attachments is confidential and intended only for the addressee(s). If you are not an addressee, you may not copy or disclose the information, or act upon it, and you should delete it entirely from your email system. Please notify the sender that you received this email in error.