You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Luca Foppiano <lu...@foppiano.org> on 2021/03/29 13:15:15 UTC
Fuzzy matching with
Hi all,
I’ve followed the tutorial on how to set up the SolrTextTagger [1] using biomedical ontology instead of the geotag.
I can send a request and get a response that capture all the exact terms:
curl -X POST 'http://localhost:8983/solr/bao_ontology/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,label&wt=json&indent=on&ignoreStopwords=true' -H 'Content-Type:text/plain' -d 'We are testing a malignant ependymoblastoma and a very interesting Bachem'
{
"responseHeader":{
"status":0,
"QTime":0},
"tagsCount":4,
"tags":[[
[…]
"response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
{
"label":["BAO_BIO_RB_syn"],
"name":["Å"],
"id":"http://purl.obolibrary.org/obo/UO_0000019"},
{
"label":["BAO_BIO_RB_term"],
"name":["ependymoblastoma"],
"id":"http://purl.obolibrary.org/obo/DOID_4794"},
{
"label":["BAO_BIO_RB_term"],
"name":["Bachem"],
"id":"http://www.bioassayontology.org/bao#BAO_0000858"}]
}}
And I was wondering whether it was possible to use fuzzy matching instead of strict matching. From the documentation doesn’t seems that the API offers more options than [2], but maybe programmatically?
Any idea is welcome.
Thank you in advance
Luca
[1] https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames <https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames>
[2] https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters <https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters>
Re: Fuzzy matching with
Posted by Sujit Pal <su...@comcast.net>.
Hi Luca,
If you build the dictionary using stemmed (or lemmatized) terms, and you
pass your input document also through the same transformation, then you can
do fuzzier matching than exact match.
-sujit
On Mon, Mar 29, 2021 at 6:15 AM Luca Foppiano <lu...@foppiano.org> wrote:
> Hi all,
> I’ve followed the tutorial on how to set up the SolrTextTagger [1] using
> biomedical ontology instead of the geotag.
>
> I can send a request and get a response that capture all the exact terms:
>
> curl -X POST '
> http://localhost:8983/solr/bao_ontology/tag?overlaps=NO_SUB&tagsLimit=5000&fl=id,name,label&wt=json&indent=on&ignoreStopwords=true'
> -H 'Content-Type:text/plain' -d 'We are testing a malignant
> ependymoblastoma and a very interesting Bachem'
> {
> "responseHeader":{
> "status":0,
> "QTime":0},
> "tagsCount":4,
> "tags":[[
> […]
> "response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
> {
> "label":["BAO_BIO_RB_syn"],
> "name":["Å"],
> "id":"http://purl.obolibrary.org/obo/UO_0000019"},
> {
> "label":["BAO_BIO_RB_term"],
> "name":["ependymoblastoma"],
> "id":"http://purl.obolibrary.org/obo/DOID_4794"},
> {
> "label":["BAO_BIO_RB_term"],
> "name":["Bachem"],
> "id":"http://www.bioassayontology.org/bao#BAO_0000858"}]
> }}
>
> And I was wondering whether it was possible to use fuzzy matching instead
> of strict matching. From the documentation doesn’t seems that the API
> offers more options than [2], but maybe programmatically?
>
> Any idea is welcome.
>
> Thank you in advance
> Luca
>
>
> [1]
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames
> <
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tutorial-with-geonames
> >
> [2]
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters
> <
> https://solr.apache.org/guide/8_0/the-tagger-handler.html#tagger-parameters>
>
>
>