You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Luca Foppiano <lu...@foppiano.org> on 2021/03/29 13:15:15 UTC

Fuzzy matching with

Hi all, 
  I’ve followed the tutorial on how to set up the SolrTextTagger [1] using biomedical ontology instead of the geotag. 

I can send a request and get a response that capture all the exact terms: 

curl -X POST   'http://localhost:8983/solr/bao_ontology/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,label&wt=json&indent=on&ignoreStopwords=true'   -H 'Content-Type:text/plain' -d 'We are testing a malignant ependymoblastoma and a very interesting Bachem'
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "tagsCount":4,
  "tags":[[
[…]
  "response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
      {
        "label":["BAO_BIO_RB_syn"],
        "name":["Å"],
        "id":"http://purl.obolibrary.org/obo/UO_0000019"},
      {
        "label":["BAO_BIO_RB_term"],
        "name":["ependymoblastoma"],
        "id":"http://purl.obolibrary.org/obo/DOID_4794"},
      {
        "label":["BAO_BIO_RB_term"],
        "name":["Bachem"],
        "id":"http://www.bioassayontology.org/bao#BAO_0000858"}]
  }}

And I was wondering whether it was possible to use fuzzy matching instead of strict matching. From the documentation doesn’t seems that the API offers more options than [2], but maybe programmatically? 

Any idea is welcome. 

Thank you in advance 
Luca


[1] https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames <https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames>
[2] https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters <https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters> 


Re: Fuzzy matching with

Posted by Sujit Pal <su...@comcast.net>.
Hi Luca,

If you build the dictionary using stemmed (or lemmatized) terms, and you
pass your input document also through the same transformation, then you can
do fuzzier matching than exact match.

-sujit


On Mon, Mar 29, 2021 at 6:15 AM Luca Foppiano <lu...@foppiano.org> wrote:

> Hi all,
>   I’ve followed the tutorial on how to set up the SolrTextTagger [1] using
> biomedical ontology instead of the geotag.
>
> I can send a request and get a response that capture all the exact terms:
>
> curl -X POST   '
> http://localhost:8983/solr/bao_ontology/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,label&wt=json&indent=on&ignoreStopwords=true'
>  -H 'Content-Type:text/plain' -d 'We are testing a malignant
> ependymoblastoma and a very interesting Bachem'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0},
>   "tagsCount":4,
>   "tags":[[
> […]
>   "response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
>       {
>         "label":["BAO_BIO_RB_syn"],
>         "name":["Å"],
>         "id":"http://purl.obolibrary.org/obo/UO_0000019"},
>       {
>         "label":["BAO_BIO_RB_term"],
>         "name":["ependymoblastoma"],
>         "id":"http://purl.obolibrary.org/obo/DOID_4794"},
>       {
>         "label":["BAO_BIO_RB_term"],
>         "name":["Bachem"],
>         "id":"http://www.bioassayontology.org/bao#BAO_0000858"}]
>   }}
>
> And I was wondering whether it was possible to use fuzzy matching instead
> of strict matching. From the documentation doesn’t seems that the API
> offers more options than [2], but maybe programmatically?
>
> Any idea is welcome.
>
> Thank you in advance
> Luca
>
>
> [1]
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames
> <
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames
> >
> [2]
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters
> <
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters>
>
>
>