You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by James <ja...@ohrt.info> on 2018/01/18 11:29:21 UTC

Documentation Slop (DisMax parser)

Hi:

 

There seems to be an error in the documentation about the slop parameter ps
used by the eDisMax parser. It reads:

 

 

"This means that if the terms "foo" and "bar" appear in the document with
less than 10 terms between each

other, the phrase will match."

 

 

Counterexample:

"Foo one two three four five fix seven eight nine bar" will not match with
ps=10

 

It seems that it must be "less than 9".

 

 

However, when more query terms are used it gets complicated when one tries
to count words in between.

 

 

Easier to understand (and correct according to my testing) would be
something like:

 

"This means that if the terms "foo" and "bar" appear in the document within
a group of 10 or less terms, the phrase will match. For example the doc that
says:

*Foo* term1 term2 term3 *bar*

will match the phrase query. A document that says

*Foo* term1 term2 term3 term4 term5 term6 term7 term8 term9 *bar* 

will not (because the search terms are within a group of 11 terms).

Note: If any search term is a MUST-NOT term, the phrase slop query will
never match.

"

 

 

Anybody willing to review and change to documentation?

 

Thanks,

James

 

 


Re: Documentation Slop (DisMax parser)

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi James,

1. Good catch, and thanks for reporting it.
2. The improved wording you proposed above matches my (limited)
understanding.  Others might see something wrong that I missed, but I
think it's definitely an improvement over the current wording.
3. If you'd like, you can start the change yourself!  The
reference-guide documentation used to be much more "locked-down", but
now it lives in Asciidoc format alongside the Solr code.  Doc
bugs/improvements are handled through JIRA issues the same as any
other bugs would be.  If you're interested in opening a JIRA for this
and proposing your wording, you can get started using the instructions
here: https://wiki.apache.org/solr/HowToContribute.  Of course, if you
don't have the time or are uninterested in moving this along, I've got
a few minutes to upload a patch to JIRA on your behalf (though it
can't actually get merged without attention from a committer).

Best,

Jason

On Thu, Jan 18, 2018 at 6:29 AM, James <ja...@ohrt.info> wrote:
> Hi:
>
>
>
> There seems to be an error in the documentation about the slop parameter ps
> used by the eDisMax parser. It reads:
>
>
>
>
>
> "This means that if the terms "foo" and "bar" appear in the document with
> less than 10 terms between each
>
> other, the phrase will match."
>
>
>
>
>
> Counterexample:
>
> "Foo one two three four five fix seven eight nine bar" will not match with
> ps=10
>
>
>
> It seems that it must be "less than 9".
>
>
>
>
>
> However, when more query terms are used it gets complicated when one tries
> to count words in between.
>
>
>
>
>
> Easier to understand (and correct according to my testing) would be
> something like:
>
>
>
> "This means that if the terms "foo" and "bar" appear in the document within
> a group of 10 or less terms, the phrase will match. For example the doc that
> says:
>
> *Foo* term1 term2 term3 *bar*
>
> will match the phrase query. A document that says
>
> *Foo* term1 term2 term3 term4 term5 term6 term7 term8 term9 *bar*
>
> will not (because the search terms are within a group of 11 terms).
>
> Note: If any search term is a MUST-NOT term, the phrase slop query will
> never match.
>
> "
>
>
>
>
>
> Anybody willing to review and change to documentation?
>
>
>
> Thanks,
>
> James
>
>
>
>
>