You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wendy2 <we...@rcsb.org> on 2018/04/18 18:35:53 UTC

How to ptotect middile initials during search

Hi fellow Users,

Why did Solr return "Ellington, W.R." when I did a name search for
"Ellington, A."?  
I even added "A." in the protwords.txt file. The debugQuery shows that the
middle initial got dropped in the parsedquery.
How can I make Solr NOT to drop the middle initial?  Thanks for your help!! 
 
======Search results========
Ellington, A.D.
Ellington, R.W..

=======debugQuery=========
{
  "responseHeader":{
    "status":0,
    "QTime":51,
    "params":{
      "q":"\"Ellington, A.\"",
      "indent":"on",
      "fl":"audit_author.name",
      "wt":"json",
      "debugQuery":"true"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "audit_author.name":"Azzi, A., Clark, S.A., Ellington, R.W.,
Chapman, M.S."},
      {
        "audit_author.name":"Ye, X., Gorin, A., Ellington, A.D., Patel,
D.J."}]
  },
  "debug":{
    "rawquerystring":"\"Ellington, A.\"",
    "querystring":"\"Ellington, A.\"",
   
"parsedquery":"(+DisjunctionMaxQuery(((entity_name_com.name:ellington)^20.0)))/no_coord",
    "parsedquery_toString":"+((entity_name_com.name:ellington)^20.0)",
   "QParser":"ExtendedDismaxQParser",




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to protect middile initials during search

Posted by Wendy2 <we...@rcsb.org>.
Hi Alessandro,

Thank you very much for your reply!

I got the issue resolved based on the suggestion from the article below:
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/

*I created a new fieldType:*
<fieldType name="pdb_text_name" class="solr.TextField" 
positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="200"/> 
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to protect middile initials during search

Posted by Alessandro Benedetti <a....@sease.io>.
Hi Wendy,
I recommend to properly configure your analysis chain.
You can start posting it here and we can help.

Generally speaking you should use the analysis tool in the Solr admin to
verify first the analysis chain is configured as you expect, then you can
pass modelling the query appropriately.

Cheers




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to protect middile initials during search

Posted by Wendy2 <we...@rcsb.org>.
Hi Jay,

Thank you very much for your reply!
I re-indexed the data after removing the stopword
filter. It looks that Solr parsed the data correctly but didn't return any
results. Anything else could I try?  Thank you again!  

=======debugQuery Output=========
{
  "responseHeader":{
    "status":0,
    "QTime":6,
    "params":{
      "q":"\"Ellington, A.\"",
      "indent":"on",
      "fl":"audit_author.name",
      "wt":"json",
      "debugQuery":"true"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
    "rawquerystring":"\"Ellington, A.\"",
    "querystring":"\"Ellington, A.\"",
    "parsedquery":"(+DisjunctionMaxQuery(((pdb_id:Ellington, A.)^5.0 |
(audit_author.name:\"ellington, a.\")^5.
.........
    "parsedquery_toString":"+((pdb_id:Ellington, A.)^5.0 |
(audit_author.name:\"ellington, a.\")^5.0 |
.........
 "QParser":"ExtendedDismaxQParser",




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: How to ptotect middile initials during search

Posted by Walter Underwood <wu...@wunderwood.org>.
Or even better, don’t remove stopwords.

Stopwords are a technique invented for 16-bit machines, where common words made posting lists too long to to handle.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 18, 2018, at 2:20 PM, Jay Potharaju <js...@gmail.com> wrote:
> 
> A is part of stopwords ...that is why it got dropped. Protected words will
> only stop it from stemming
> 
> https://lucene.apache.org/solr/guide/6_6/language-analysis.html
> 
> Thanks
> Jay Potharaju
> 
> 
> On Wed, Apr 18, 2018 at 11:35 AM, Wendy2 <we...@rcsb.org> wrote:
> 
>> Hi fellow Users,
>> 
>> Why did Solr return "Ellington, W.R." when I did a name search for
>> "Ellington, A."?
>> I even added "A." in the protwords.txt file. The debugQuery shows that the
>> middle initial got dropped in the parsedquery.
>> How can I make Solr NOT to drop the middle initial?  Thanks for your
>> help!!
>> 
>> ======Search results========
>> Ellington, A.D.
>> Ellington, R.W..
>> 
>> =======debugQuery=========
>> {
>>  "responseHeader":{
>>    "status":0,
>>    "QTime":51,
>>    "params":{
>>      "q":"\"Ellington, A.\"",
>>      "indent":"on",
>>      "fl":"audit_author.name",
>>      "wt":"json",
>>      "debugQuery":"true"}},
>>  "response":{"numFound":2,"start":0,"docs":[
>>      {
>>        "audit_author.name":"Azzi, A., Clark, S.A., Ellington, R.W.,
>> Chapman, M.S."},
>>      {
>>        "audit_author.name":"Ye, X., Gorin, A., Ellington, A.D., Patel,
>> D.J."}]
>>  },
>>  "debug":{
>>    "rawquerystring":"\"Ellington, A.\"",
>>    "querystring":"\"Ellington, A.\"",
>> 
>> "parsedquery":"(+DisjunctionMaxQuery(((entity_name_com.name:
>> ellington)^20.0)))/no_coord",
>>    "parsedquery_toString":"+((entity_name_com.name:ellington)^20.0)",
>>   "QParser":"ExtendedDismaxQParser",
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 


Re: How to ptotect middile initials during search

Posted by Jay Potharaju <js...@gmail.com>.
A is part of stopwords ...that is why it got dropped. Protected words will
only stop it from stemming

https://lucene.apache.org/solr/guide/6_6/language-analysis.html

Thanks
Jay Potharaju


On Wed, Apr 18, 2018 at 11:35 AM, Wendy2 <we...@rcsb.org> wrote:

> Hi fellow Users,
>
> Why did Solr return "Ellington, W.R." when I did a name search for
> "Ellington, A."?
> I even added "A." in the protwords.txt file. The debugQuery shows that the
> middle initial got dropped in the parsedquery.
> How can I make Solr NOT to drop the middle initial?  Thanks for your
> help!!
>
> ======Search results========
> Ellington, A.D.
> Ellington, R.W..
>
> =======debugQuery=========
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":51,
>     "params":{
>       "q":"\"Ellington, A.\"",
>       "indent":"on",
>       "fl":"audit_author.name",
>       "wt":"json",
>       "debugQuery":"true"}},
>   "response":{"numFound":2,"start":0,"docs":[
>       {
>         "audit_author.name":"Azzi, A., Clark, S.A., Ellington, R.W.,
> Chapman, M.S."},
>       {
>         "audit_author.name":"Ye, X., Gorin, A., Ellington, A.D., Patel,
> D.J."}]
>   },
>   "debug":{
>     "rawquerystring":"\"Ellington, A.\"",
>     "querystring":"\"Ellington, A.\"",
>
> "parsedquery":"(+DisjunctionMaxQuery(((entity_name_com.name:
> ellington)^20.0)))/no_coord",
>     "parsedquery_toString":"+((entity_name_com.name:ellington)^20.0)",
>    "QParser":"ExtendedDismaxQParser",
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>