You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wendy2 <we...@rcsb.org> on 2018/04/18 19:12:14 UTC

need help on search on last name + middile initial

Hi Solr experts:

How can I make sure Solr doesn't drop middle initial when I do a name
search?
I did a search with double quotes for "Ellington, A.", but Solr parser
dropped the middle initial, so I got both back:
I even tried keeping A. in the protwords.txt file, but didn't work. 
Any work around or suggestions?  Thanks!!!

*RESULTS:*
 "response":{"numFound":2,"start":0,"docs":[
      {
        "audit_author.name":"Azzi, A., Clark, S.A., Ellington, R.W.,
Chapman, M.S."},
      {
        "audit_author.name":"Ye, X., Gorin, A., Ellington, A.D., Patel,
D.J."}]
  },
  "debug":{
"debugQuery mode indicates that Solr dropped the ""A."" when parsing the
query:
  ""debug"":{
    ""rawquerystring"":""\""Ellington, A.\"""",
    ""querystring"":""\""Ellington, A.\"""",
   
""parsedquery"":""(+DisjunctionMaxQuery(((entity_name_com.name:ellington)^20.0)))/no_coord"",
    ""parsedquery_toString"":""+((entity_name_com.name:ellington)^20.0)"",
   ""QParser"":""ExtendedDismaxQParser"", "



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: need help on search on last name + middile initial

Posted by Wendy2 <we...@rcsb.org>.
The issue was resolved.

*I created a new fieldType:*
<fieldType name="pdb_text_name" class="solr.TextField" 
positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="200"/> 
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType> 


*A reference:*
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: need help on search on last name + middile initial

Posted by Wendy2 <we...@rcsb.org>.
Hi Shawn,

The issue got resolved :-)  Thank you very much for your help!!

*I created a new fieldType:*
<fieldType name="pdb_text_name" class="solr.TextField" 
positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="200"/> 
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType> 

*
A reference:*
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: need help on search on last name + middile initial

Posted by Wendy2 <we...@rcsb.org>.
Hi Shawn,

Thank you very much for your reply!
Per your suggestion, I re-indexed the data after removing the stopword
filter. It looks that Solr parsed the data correctly but didn't return any
results. Anything else could I try?  Thank you again!  

=======debugQuery Output=========
{
  "responseHeader":{
    "status":0,
    "QTime":6,
    "params":{
      "q":"\"Ellington, A.\"",
      "indent":"on",
      "fl":"audit_author.name",
      "wt":"json",
      "debugQuery":"true"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
    "rawquerystring":"\"Ellington, A.\"",
    "querystring":"\"Ellington, A.\"",
    "parsedquery":"(+DisjunctionMaxQuery(((pdb_id:Ellington, A.)^5.0 |
(audit_author.name:\"ellington, a.\")^5.
.........
    "parsedquery_toString":"+((pdb_id:Ellington, A.)^5.0 |
(audit_author.name:\"ellington, a.\")^5.0 |
.........
 "QParser":"ExtendedDismaxQParser",

======Here is the analysis from Solr Admin UI======
  <http://lucene.472066.n3.nabble.com/file/t493740/Screenshot-25.png>   





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: need help on search on last name + middile initial

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/18/2018 1:12 PM, Wendy2 wrote:
>   "debug":{
> "debugQuery mode indicates that Solr dropped the ""A."" when parsing the
> query:
>   ""debug"":{
>     ""rawquerystring"":""\""Ellington, A.\"""",
>     ""querystring"":""\""Ellington, A.\"""",
>    
> ""parsedquery"":""(+DisjunctionMaxQuery(((entity_name_com.name:ellington)^20.0)))/no_coord"",
>     ""parsedquery_toString"":""+((entity_name_com.name:ellington)^20.0)"",
>    ""QParser"":""ExtendedDismaxQParser"", "

Very likely the period was removed by the tokenizer or a filter like
WordDelimiterFilter.  Then I would guess that the "a" was removed by a
StopFilter.

Open your admin UI, choose your index from the dropdown, and click
"Analysis."  Then choose the entity_name_com.name field from the
dropdown, and type "Ellington, A." (without the quotes) in the "Query"
side.  When the analysis completes, you will be able to tell exactly 
what each step in your analysis chain is doing to the input.

I would recommend NOT using a stopword filter.  Modern server hardware
usually has plenty of resources to handle indexes with stopwords still
included, and removing stopwords can cause certain search problems.

Thanks,
Shawn