You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Guillaume Smet (JIRA)" <ji...@apache.org> on 2008/07/14 14:15:32 UTC

[jira] Created: (SOLR-629) Fuzzy search with DisMax request handler

Fuzzy search with DisMax request handler
----------------------------------------

                 Key: SOLR-629
                 URL: https://issues.apache.org/jira/browse/SOLR-629
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 1.3
            Reporter: Guillaume Smet
            Priority: Minor


The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.

Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.

The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.

The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.

Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?

Comments are very welcome, I'll spend the time needed to put this patch in good shape.

Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Mikelis Zalais (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688693#action_12688693 ] 

Mikelis Zalais commented on SOLR-629:
-------------------------------------

Hi, is there any progress with this?

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Guillaume Smet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688784#action_12688784 ] 

Guillaume Smet commented on SOLR-629:
-------------------------------------

Hi Otis,

The proposed syntax is taken from the unit test which is based on the existing one ( see testParseFieldBoosts() in http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/util/SolrPluginUtilsTest.java?revision=701485&view=markup ). The existing one contains a negative boost. So does the new one. I didn't change the way Solr parses the values.
Perhaps we need to be more strict about it?

There is still an unanswered question from my initial proposal:
"Only the qf parameter supports this syntax atm. I suppose we should make it usable in pf too. Any opinion?"

That said, it's probably better to validate the general approach of the patch before thinking about generalizing it.

-- 
Guillaume

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Guillaume Smet (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guillaume Smet updated SOLR-629:
--------------------------------

    Attachment: dismax_fuzzy_query_field.v0.1.diff

Here is the same patch updated to trunk to resolve a few conflicts.

It would be nice to have some feedback as it could be a nice enhancement for DisMax in Solr 1.4. I can rework it if needed.

We run several instances of Solr with this patch for more than 8 months now as we really needed fuzzy search with DisMax.

Thanks.

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Chris Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704450#action_12704450 ] 

Chris Williams commented on SOLR-629:
-------------------------------------

Hi,
FYI: the patch didn't seem to apply cleanly on 1.3, but worked fine on 1.4

Anyways, I'm having some trouble with this patch.  It doesn't seem to respect any of my query filters.

For example, I have a dismax query 
where q=the game
where qf = 'title_words~.06'

where my 'title_words' field is:
    <fieldType name="textExactWSTokenized" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
	<filter class="solr.StandardFilterFactory"/>
	<filter class="solr.TrimFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

I get this as the parsed query:
"parsedquery_toString"=>"+(((title_words:the~0.6)~0.01 (title_words:game~0.6)~0.01)~2) ()"
(I don't want it running anything on the word 'the' because its a stop word)

Yet if I change qf to just 'title_words' and remove the fuzziness, same query text, I get this:
"parsedquery_toString"=>"+(((title_words:game)~0.01)~1) ()"
(which is what I want)


> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Guillaume Smet (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guillaume Smet updated SOLR-629:
--------------------------------

    Attachment: dismax_fuzzy_query_field.v0.1.diff

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688775#action_12688775 ] 

Otis Gospodnetic commented on SOLR-629:
---------------------------------------

Mikelis: have you tried it?  Does it work well and as described?  Please do and leave your feedback here (or fixes in form of another patch).


I haven't looked at the patch, but I like the example syntax.

Question about "fieldThree~0.2^-0.4" -- is that a negative boost?  huh?


> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Guillaume Smet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704491#action_12704491 ] 

Guillaume Smet commented on SOLR-629:
-------------------------------------

bq. FYI: the patch didn't seem to apply cleanly on 1.3, but worked fine on 1.4

The old version of the patch which is still attached should work with 1.3. At least, I use it on a pre 1.3 version.

The new one is rebased on 1.4 but is the exact same patch.

{quote}
I get this as the parsed query:
"parsedquery_toString"=>"+(((title_words:the~0.6)~0.01 (title_words:game~0.6)~0.01)~2) ()"
(I don't want it running anything on the word 'the' because its a stop word)
{quote}

AFAIK, it's the standard behaviour for fuzziness (and for wildcard queries). The stop word isn't removed because the~0.06 != the, it might be another word.

Could any Solr guy confirm?

Note that 0.06 is really too low IMHO. I usually use 0.8 or 0.7 for fuzziness.

-- 
Guillaume

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-629) Fuzzy search with DisMax request handler

Posted by "Chris Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704525#action_12704525 ] 

Chris Williams commented on SOLR-629:
-------------------------------------

sorry, it was a typo. I was using 0.6 for the fuzziness, not 0.06.

(I have about a week and half experience with solr right now, so bare with me)
Assuming you're right about it being the default behavior, is there any alternative way to get it to work? Any fuzzy search with my example above that has a stop word in it doesn't return any results.  What kind of field type do you run fuzzy search on? Do you basically just run it on a field that has no filters on it?

thanks,
Chris

> Fuzzy search with DisMax request handler
> ----------------------------------------
>
>                 Key: SOLR-629
>                 URL: https://issues.apache.org/jira/browse/SOLR-629
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.3
>            Reporter: Guillaume Smet
>            Priority: Minor
>         Attachments: dismax_fuzzy_query_field.v0.1.diff, dismax_fuzzy_query_field.v0.1.diff
>
>
> The DisMax search handler doesn't support fuzzy queries which would be quite useful for our usage of Solr and from what I've seen on the list, it's something several people would like to have.
> Following this discussion http://markmail.org/message/tx6kqr7ga6ponefa#query:solr%20dismax%20fuzzy+page:1+mid:c4pciq6rlr4dwtgm+state:results , I added the ability to add fuzzy query field in the qf parameter. I kept the patch as conservative as possible.
> The syntax supported is: fieldOne^2.3 fieldTwo~0.3 fieldThree~0.2^-0.4 fieldFour as discussed in the above thread.
> The recursive query aliasing should work even with fuzzy query fields using a very simple rule: the aliased fields inherit the minSimilarity of their parent, combined with their own one if they have one.
> Only the qf parameter support this syntax atm. I suppose we should make it usable in pf too. Any opinion?
> Comments are very welcome, I'll spend the time needed to put this patch in good shape.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.