You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by a peng <zh...@gmail.com> on 2010/06/27 16:12:15 UTC

A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Hi,

I know the indexed content contains the following text: "This is a test".
And the search phrase I used is "This is a formal test", and then I set the
slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not get
a search result. If I set the search phrase as "This is formal test", then I
can get the search result.

So what is the problem here, thanks in advance.


Attached is the Java doc for the setSlop method:

public void *setSlop*(int s)

Sets the number of other words permitted between words in query phrase. If
zero, then this is an exact phrase search. For larger values this works like
a WITHIN or NEAR operator.

The slop is in fact an edit-distance, where the units correspond to moves of
terms in the query phrase out of position. For example, to switch the order
of two words requires two moves (the first move places the words atop one
another), so to permit re-orderings of phrases, the slop must be at least
two.

More exact matches are scored higher than sloppier matches, thus search
results are sorted by exactness.

The slop is zero by default, requiring exact matches.

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by Erick Erickson <er...@gmail.com>.
No, I really don't have a good solution off the top of my head, perhaps
others can chime in...

Although I suppose you could fire multiple queries dropping out
some number of search terms, but I don't know whether that
satisfies your requirements. E.g. search the following phrases
"this is a formal test"
"this is a formal"
"this is a test"
"this is formal test"
"this a formal test"
"is a formal test"

and so on...

Best
Erick

On Tue, Jun 29, 2010 at 8:30 PM, a peng <zh...@gmail.com> wrote:

> Hi Erick,
>
> Any comments about this requirement?
>
> 2010/6/29 a peng <zh...@gmail.com>
>
> > Hi Erick,
> >
> > Thanks for you reply, now I get the point why I can not get the search
> > result. But can you guide me how can I use Lucene to implement the
> following
> > search feature:
> > Basically we can call this feature "fuzzy phrase search", which means the
> > search phrase may contains more words or less words compared to the
> > sentences indexed in the document. Again I want to use the sample I
> posted
> > to demo it:
> >
> > The document contains a sentence "This is a test", and the search phrase
> is
> > "This is a formal test", as there is only one word difference between the
> > two sentences, how can I get the "This is a test" phrase as the search
> > result?
> >
> > Thanks.
> >
> >
> >
> > 2010/6/29 Erick Erickson <er...@gmail.com>
> >
> > No, I don't think so. The critical bit is that the indexed text
> >> does NOT contain the word "formal".  So searching for
> >> any phrase that DOES contain "formal" should fail no matter
> >> what the slop.
> >>
> >> Phrase queries are something like "find all the words in this
> >> search string, ignoring some number of intervening tokens not in the
> >> search string". There's nothing in there about "find only some of the
> >> words in the search string"....
> >>
> >> I'm guessing that the original post had a typo in the success case,
> >> because it's contradicted by "a peng's" second post. It's always
> >> possible that I'm experiencing a brain short......
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t....@gmail.com>
> >> wrote:
> >>
> >> > Hey Erick
> >> >
> >> > Thanks mate!
> >> >
> >> > So I guess my explanation in the mail chain above was correct!
> >> >
> >> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
> >> erickerickson@gmail.com
> >> > >wrote:
> >> >
> >> > > I think you're misunderstanding the intent of PhraseQueries and
> slop.
> >> > Slop
> >> > > is the number of intervening tokens that may exist between the words
> >> > > you're looking for. However, all the words you're looking for MUST
> >> exist.
> >> > > So,
> >> > >
> >> > > <<< whenever the search phrase contains a word that don't
> >> > > exist in the document, the search result will be empty >>>
> >> > >
> >> > > is exactly how this is intended to work.
> >> > >
> >> > > HTH
> >> > > Erick
> >> > >
> >> > >
> >> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > My test result is that whenever the search phrase contains a word
> >> that
> >> > > > don't
> >> > > > exist in the document, the search result will be empty no matter
> how
> >> > big
> >> > > > the
> >> > > > slop factor I set, seems this is a bug of Lucene, or it is work as
> >> > > design?
> >> > > >
> >> > > > 2010/6/28 tarun sapra <t....@gmail.com>
> >> > > >
> >> > > > > Hi ,
> >> > > > >
> >> > > > > I think I have been able to understand whats happening here...
> >> > > > >
> >> > > > > Indexed Content : "This is a test".
> >> > > > > your search phrase : "This is a formal test"
> >> > > > > your setting the slop factor 2 , now if your slop factor is 3 it
> >> > should
> >> > > > > work
> >> > > > > because "is" and "a" are stop words thus the words "This" and
> >> "test"
> >> > > are
> >> > > > 2
> >> > > > > slop factor apart but in your search phrase "This is a formal
> >> test"
> >> > the
> >> > > > > words "This" and "test"  are 3 slop factor thats why it's nor
> >> working
> >> > > > > now in search phrase "This is formal test" the words "This" and
> >> > "test"
> >> > > > are
> >> > > > > 2
> >> > > > > slop factor apart thats why this phrase is working.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <
> zhoudengpeng@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> >> > > > > >
> >> > > > > > 2010/6/27 tarun sapra <t....@gmail.com>
> >> > > > > >
> >> > > > > > > which analyzer are you usin'?
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
> >> zhoudengpeng@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I know the indexed content contains the following text:
> >> "This
> >> > is
> >> > > a
> >> > > > > > test".
> >> > > > > > > > And the search phrase I used is "This is a formal test",
> and
> >> > then
> >> > > I
> >> > > > > set
> >> > > > > > > the
> >> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found
> >> that
> >> > I
> >> > > > can
> >> > > > > > not
> >> > > > > > > > get
> >> > > > > > > > a search result. If I set the search phrase as "This is
> >> formal
> >> > > > test",
> >> > > > > > > then
> >> > > > > > > > I
> >> > > > > > > > can get the search result.
> >> > > > > > > >
> >> > > > > > > > So what is the problem here, thanks in advance.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Attached is the Java doc for the setSlop method:
> >> > > > > > > >
> >> > > > > > > > public void *setSlop*(int s)
> >> > > > > > > >
> >> > > > > > > > Sets the number of other words permitted between words in
> >> query
> >> > > > > phrase.
> >> > > > > > > If
> >> > > > > > > > zero, then this is an exact phrase search. For larger
> values
> >> > this
> >> > > > > works
> >> > > > > > > > like
> >> > > > > > > > a WITHIN or NEAR operator.
> >> > > > > > > >
> >> > > > > > > > The slop is in fact an edit-distance, where the units
> >> > correspond
> >> > > to
> >> > > > > > moves
> >> > > > > > > > of
> >> > > > > > > > terms in the query phrase out of position. For example, to
> >> > switch
> >> > > > the
> >> > > > > > > order
> >> > > > > > > > of two words requires two moves (the first move places the
> >> > words
> >> > > > atop
> >> > > > > > one
> >> > > > > > > > another), so to permit re-orderings of phrases, the slop
> >> must
> >> > be
> >> > > at
> >> > > > > > least
> >> > > > > > > > two.
> >> > > > > > > >
> >> > > > > > > > More exact matches are scored higher than sloppier
> matches,
> >> > thus
> >> > > > > search
> >> > > > > > > > results are sorted by exactness.
> >> > > > > > > >
> >> > > > > > > > The slop is zero by default, requiring exact matches.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > Thanks & Regards
> >> > > > > > > Tarun Sapra
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Thanks & Regards
> >> > > > > Tarun Sapra
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards
> >> > Tarun Sapra
> >> >
> >>
> >
> >
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by a peng <zh...@gmail.com>.
Hi Erick,

Any comments about this requirement?

2010/6/29 a peng <zh...@gmail.com>

> Hi Erick,
>
> Thanks for you reply, now I get the point why I can not get the search
> result. But can you guide me how can I use Lucene to implement the following
> search feature:
> Basically we can call this feature "fuzzy phrase search", which means the
> search phrase may contains more words or less words compared to the
> sentences indexed in the document. Again I want to use the sample I posted
> to demo it:
>
> The document contains a sentence "This is a test", and the search phrase is
> "This is a formal test", as there is only one word difference between the
> two sentences, how can I get the "This is a test" phrase as the search
> result?
>
> Thanks.
>
>
>
> 2010/6/29 Erick Erickson <er...@gmail.com>
>
> No, I don't think so. The critical bit is that the indexed text
>> does NOT contain the word "formal".  So searching for
>> any phrase that DOES contain "formal" should fail no matter
>> what the slop.
>>
>> Phrase queries are something like "find all the words in this
>> search string, ignoring some number of intervening tokens not in the
>> search string". There's nothing in there about "find only some of the
>> words in the search string"....
>>
>> I'm guessing that the original post had a typo in the success case,
>> because it's contradicted by "a peng's" second post. It's always
>> possible that I'm experiencing a brain short......
>>
>> Best
>> Erick
>>
>> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t....@gmail.com>
>> wrote:
>>
>> > Hey Erick
>> >
>> > Thanks mate!
>> >
>> > So I guess my explanation in the mail chain above was correct!
>> >
>> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
>> erickerickson@gmail.com
>> > >wrote:
>> >
>> > > I think you're misunderstanding the intent of PhraseQueries and slop.
>> > Slop
>> > > is the number of intervening tokens that may exist between the words
>> > > you're looking for. However, all the words you're looking for MUST
>> exist.
>> > > So,
>> > >
>> > > <<< whenever the search phrase contains a word that don't
>> > > exist in the document, the search result will be empty >>>
>> > >
>> > > is exactly how this is intended to work.
>> > >
>> > > HTH
>> > > Erick
>> > >
>> > >
>> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com>
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > My test result is that whenever the search phrase contains a word
>> that
>> > > > don't
>> > > > exist in the document, the search result will be empty no matter how
>> > big
>> > > > the
>> > > > slop factor I set, seems this is a bug of Lucene, or it is work as
>> > > design?
>> > > >
>> > > > 2010/6/28 tarun sapra <t....@gmail.com>
>> > > >
>> > > > > Hi ,
>> > > > >
>> > > > > I think I have been able to understand whats happening here...
>> > > > >
>> > > > > Indexed Content : "This is a test".
>> > > > > your search phrase : "This is a formal test"
>> > > > > your setting the slop factor 2 , now if your slop factor is 3 it
>> > should
>> > > > > work
>> > > > > because "is" and "a" are stop words thus the words "This" and
>> "test"
>> > > are
>> > > > 2
>> > > > > slop factor apart but in your search phrase "This is a formal
>> test"
>> > the
>> > > > > words "This" and "test"  are 3 slop factor thats why it's nor
>> working
>> > > > > now in search phrase "This is formal test" the words "This" and
>> > "test"
>> > > > are
>> > > > > 2
>> > > > > slop factor apart thats why this phrase is working.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
>> > > > > >
>> > > > > > 2010/6/27 tarun sapra <t....@gmail.com>
>> > > > > >
>> > > > > > > which analyzer are you usin'?
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
>> zhoudengpeng@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I know the indexed content contains the following text:
>> "This
>> > is
>> > > a
>> > > > > > test".
>> > > > > > > > And the search phrase I used is "This is a formal test", and
>> > then
>> > > I
>> > > > > set
>> > > > > > > the
>> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found
>> that
>> > I
>> > > > can
>> > > > > > not
>> > > > > > > > get
>> > > > > > > > a search result. If I set the search phrase as "This is
>> formal
>> > > > test",
>> > > > > > > then
>> > > > > > > > I
>> > > > > > > > can get the search result.
>> > > > > > > >
>> > > > > > > > So what is the problem here, thanks in advance.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Attached is the Java doc for the setSlop method:
>> > > > > > > >
>> > > > > > > > public void *setSlop*(int s)
>> > > > > > > >
>> > > > > > > > Sets the number of other words permitted between words in
>> query
>> > > > > phrase.
>> > > > > > > If
>> > > > > > > > zero, then this is an exact phrase search. For larger values
>> > this
>> > > > > works
>> > > > > > > > like
>> > > > > > > > a WITHIN or NEAR operator.
>> > > > > > > >
>> > > > > > > > The slop is in fact an edit-distance, where the units
>> > correspond
>> > > to
>> > > > > > moves
>> > > > > > > > of
>> > > > > > > > terms in the query phrase out of position. For example, to
>> > switch
>> > > > the
>> > > > > > > order
>> > > > > > > > of two words requires two moves (the first move places the
>> > words
>> > > > atop
>> > > > > > one
>> > > > > > > > another), so to permit re-orderings of phrases, the slop
>> must
>> > be
>> > > at
>> > > > > > least
>> > > > > > > > two.
>> > > > > > > >
>> > > > > > > > More exact matches are scored higher than sloppier matches,
>> > thus
>> > > > > search
>> > > > > > > > results are sorted by exactness.
>> > > > > > > >
>> > > > > > > > The slop is zero by default, requiring exact matches.
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Thanks & Regards
>> > > > > > > Tarun Sapra
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Thanks & Regards
>> > > > > Tarun Sapra
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Tarun Sapra
>> >
>>
>
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by a peng <zh...@gmail.com>.
Hi Erick,

Thanks for you reply, now I get the point why I can not get the search
result. But can you guide me how can I use Lucene to implement the following
search feature:
Basically we can call this feature "fuzzy phrase search", which means the
search phrase may contains more words or less words compared to the
sentences indexed in the document. Again I want to use the sample I posted
to demo it:

The document contains a sentence "This is a test", and the search phrase is
"This is a formal test", as there is only one word difference between the
two sentences, how can I get the "This is a test" phrase as the search
result?

Thanks.



2010/6/29 Erick Erickson <er...@gmail.com>

> No, I don't think so. The critical bit is that the indexed text
> does NOT contain the word "formal".  So searching for
> any phrase that DOES contain "formal" should fail no matter
> what the slop.
>
> Phrase queries are something like "find all the words in this
> search string, ignoring some number of intervening tokens not in the
> search string". There's nothing in there about "find only some of the
> words in the search string"....
>
> I'm guessing that the original post had a typo in the success case,
> because it's contradicted by "a peng's" second post. It's always
> possible that I'm experiencing a brain short......
>
> Best
> Erick
>
> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t....@gmail.com> wrote:
>
> > Hey Erick
> >
> > Thanks mate!
> >
> > So I guess my explanation in the mail chain above was correct!
> >
> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > I think you're misunderstanding the intent of PhraseQueries and slop.
> > Slop
> > > is the number of intervening tokens that may exist between the words
> > > you're looking for. However, all the words you're looking for MUST
> exist.
> > > So,
> > >
> > > <<< whenever the search phrase contains a word that don't
> > > exist in the document, the search result will be empty >>>
> > >
> > > is exactly how this is intended to work.
> > >
> > > HTH
> > > Erick
> > >
> > >
> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > My test result is that whenever the search phrase contains a word
> that
> > > > don't
> > > > exist in the document, the search result will be empty no matter how
> > big
> > > > the
> > > > slop factor I set, seems this is a bug of Lucene, or it is work as
> > > design?
> > > >
> > > > 2010/6/28 tarun sapra <t....@gmail.com>
> > > >
> > > > > Hi ,
> > > > >
> > > > > I think I have been able to understand whats happening here...
> > > > >
> > > > > Indexed Content : "This is a test".
> > > > > your search phrase : "This is a formal test"
> > > > > your setting the slop factor 2 , now if your slop factor is 3 it
> > should
> > > > > work
> > > > > because "is" and "a" are stop words thus the words "This" and
> "test"
> > > are
> > > > 2
> > > > > slop factor apart but in your search phrase "This is a formal test"
> > the
> > > > > words "This" and "test"  are 3 slop factor thats why it's nor
> working
> > > > > now in search phrase "This is formal test" the words "This" and
> > "test"
> > > > are
> > > > > 2
> > > > > slop factor apart thats why this phrase is working.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> > > > > >
> > > > > > 2010/6/27 tarun sapra <t....@gmail.com>
> > > > > >
> > > > > > > which analyzer are you usin'?
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
> zhoudengpeng@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I know the indexed content contains the following text: "This
> > is
> > > a
> > > > > > test".
> > > > > > > > And the search phrase I used is "This is a formal test", and
> > then
> > > I
> > > > > set
> > > > > > > the
> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found
> that
> > I
> > > > can
> > > > > > not
> > > > > > > > get
> > > > > > > > a search result. If I set the search phrase as "This is
> formal
> > > > test",
> > > > > > > then
> > > > > > > > I
> > > > > > > > can get the search result.
> > > > > > > >
> > > > > > > > So what is the problem here, thanks in advance.
> > > > > > > >
> > > > > > > >
> > > > > > > > Attached is the Java doc for the setSlop method:
> > > > > > > >
> > > > > > > > public void *setSlop*(int s)
> > > > > > > >
> > > > > > > > Sets the number of other words permitted between words in
> query
> > > > > phrase.
> > > > > > > If
> > > > > > > > zero, then this is an exact phrase search. For larger values
> > this
> > > > > works
> > > > > > > > like
> > > > > > > > a WITHIN or NEAR operator.
> > > > > > > >
> > > > > > > > The slop is in fact an edit-distance, where the units
> > correspond
> > > to
> > > > > > moves
> > > > > > > > of
> > > > > > > > terms in the query phrase out of position. For example, to
> > switch
> > > > the
> > > > > > > order
> > > > > > > > of two words requires two moves (the first move places the
> > words
> > > > atop
> > > > > > one
> > > > > > > > another), so to permit re-orderings of phrases, the slop must
> > be
> > > at
> > > > > > least
> > > > > > > > two.
> > > > > > > >
> > > > > > > > More exact matches are scored higher than sloppier matches,
> > thus
> > > > > search
> > > > > > > > results are sorted by exactness.
> > > > > > > >
> > > > > > > > The slop is zero by default, requiring exact matches.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Regards
> > > > > > > Tarun Sapra
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards
> > > > > Tarun Sapra
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards
> > Tarun Sapra
> >
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by Erick Erickson <er...@gmail.com>.
No, I don't think so. The critical bit is that the indexed text
does NOT contain the word "formal".  So searching for
any phrase that DOES contain "formal" should fail no matter
what the slop.

Phrase queries are something like "find all the words in this
search string, ignoring some number of intervening tokens not in the
search string". There's nothing in there about "find only some of the
words in the search string"....

I'm guessing that the original post had a typo in the success case,
because it's contradicted by "a peng's" second post. It's always
possible that I'm experiencing a brain short......

Best
Erick

On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t....@gmail.com> wrote:

> Hey Erick
>
> Thanks mate!
>
> So I guess my explanation in the mail chain above was correct!
>
> On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > I think you're misunderstanding the intent of PhraseQueries and slop.
> Slop
> > is the number of intervening tokens that may exist between the words
> > you're looking for. However, all the words you're looking for MUST exist.
> > So,
> >
> > <<< whenever the search phrase contains a word that don't
> > exist in the document, the search result will be empty >>>
> >
> > is exactly how this is intended to work.
> >
> > HTH
> > Erick
> >
> >
> > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > My test result is that whenever the search phrase contains a word that
> > > don't
> > > exist in the document, the search result will be empty no matter how
> big
> > > the
> > > slop factor I set, seems this is a bug of Lucene, or it is work as
> > design?
> > >
> > > 2010/6/28 tarun sapra <t....@gmail.com>
> > >
> > > > Hi ,
> > > >
> > > > I think I have been able to understand whats happening here...
> > > >
> > > > Indexed Content : "This is a test".
> > > > your search phrase : "This is a formal test"
> > > > your setting the slop factor 2 , now if your slop factor is 3 it
> should
> > > > work
> > > > because "is" and "a" are stop words thus the words "This" and "test"
> > are
> > > 2
> > > > slop factor apart but in your search phrase "This is a formal test"
> the
> > > > words "This" and "test"  are 3 slop factor thats why it's nor working
> > > > now in search phrase "This is formal test" the words "This" and
> "test"
> > > are
> > > > 2
> > > > slop factor apart thats why this phrase is working.
> > > >
> > > >
> > > >
> > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> > > > >
> > > > > 2010/6/27 tarun sapra <t....@gmail.com>
> > > > >
> > > > > > which analyzer are you usin'?
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I know the indexed content contains the following text: "This
> is
> > a
> > > > > test".
> > > > > > > And the search phrase I used is "This is a formal test", and
> then
> > I
> > > > set
> > > > > > the
> > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found that
> I
> > > can
> > > > > not
> > > > > > > get
> > > > > > > a search result. If I set the search phrase as "This is formal
> > > test",
> > > > > > then
> > > > > > > I
> > > > > > > can get the search result.
> > > > > > >
> > > > > > > So what is the problem here, thanks in advance.
> > > > > > >
> > > > > > >
> > > > > > > Attached is the Java doc for the setSlop method:
> > > > > > >
> > > > > > > public void *setSlop*(int s)
> > > > > > >
> > > > > > > Sets the number of other words permitted between words in query
> > > > phrase.
> > > > > > If
> > > > > > > zero, then this is an exact phrase search. For larger values
> this
> > > > works
> > > > > > > like
> > > > > > > a WITHIN or NEAR operator.
> > > > > > >
> > > > > > > The slop is in fact an edit-distance, where the units
> correspond
> > to
> > > > > moves
> > > > > > > of
> > > > > > > terms in the query phrase out of position. For example, to
> switch
> > > the
> > > > > > order
> > > > > > > of two words requires two moves (the first move places the
> words
> > > atop
> > > > > one
> > > > > > > another), so to permit re-orderings of phrases, the slop must
> be
> > at
> > > > > least
> > > > > > > two.
> > > > > > >
> > > > > > > More exact matches are scored higher than sloppier matches,
> thus
> > > > search
> > > > > > > results are sorted by exactness.
> > > > > > >
> > > > > > > The slop is zero by default, requiring exact matches.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Regards
> > > > > > Tarun Sapra
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards
> > > > Tarun Sapra
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards
> Tarun Sapra
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by tarun sapra <t....@gmail.com>.
Hey Erick

Thanks mate!

So I guess my explanation in the mail chain above was correct!

On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <er...@gmail.com>wrote:

> I think you're misunderstanding the intent of PhraseQueries and slop. Slop
> is the number of intervening tokens that may exist between the words
> you're looking for. However, all the words you're looking for MUST exist.
> So,
>
> <<< whenever the search phrase contains a word that don't
> exist in the document, the search result will be empty >>>
>
> is exactly how this is intended to work.
>
> HTH
> Erick
>
>
> On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com> wrote:
>
> > Hi,
> >
> > My test result is that whenever the search phrase contains a word that
> > don't
> > exist in the document, the search result will be empty no matter how big
> > the
> > slop factor I set, seems this is a bug of Lucene, or it is work as
> design?
> >
> > 2010/6/28 tarun sapra <t....@gmail.com>
> >
> > > Hi ,
> > >
> > > I think I have been able to understand whats happening here...
> > >
> > > Indexed Content : "This is a test".
> > > your search phrase : "This is a formal test"
> > > your setting the slop factor 2 , now if your slop factor is 3 it should
> > > work
> > > because "is" and "a" are stop words thus the words "This" and "test"
> are
> > 2
> > > slop factor apart but in your search phrase "This is a formal test" the
> > > words "This" and "test"  are 3 slop factor thats why it's nor working
> > > now in search phrase "This is formal test" the words "This" and "test"
> > are
> > > 2
> > > slop factor apart thats why this phrase is working.
> > >
> > >
> > >
> > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using StandardAnalyzer(Version.LUCENE_30);
> > > >
> > > > 2010/6/27 tarun sapra <t....@gmail.com>
> > > >
> > > > > which analyzer are you usin'?
> > > > >
> > > > >
> > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I know the indexed content contains the following text: "This is
> a
> > > > test".
> > > > > > And the search phrase I used is "This is a formal test", and then
> I
> > > set
> > > > > the
> > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found that I
> > can
> > > > not
> > > > > > get
> > > > > > a search result. If I set the search phrase as "This is formal
> > test",
> > > > > then
> > > > > > I
> > > > > > can get the search result.
> > > > > >
> > > > > > So what is the problem here, thanks in advance.
> > > > > >
> > > > > >
> > > > > > Attached is the Java doc for the setSlop method:
> > > > > >
> > > > > > public void *setSlop*(int s)
> > > > > >
> > > > > > Sets the number of other words permitted between words in query
> > > phrase.
> > > > > If
> > > > > > zero, then this is an exact phrase search. For larger values this
> > > works
> > > > > > like
> > > > > > a WITHIN or NEAR operator.
> > > > > >
> > > > > > The slop is in fact an edit-distance, where the units correspond
> to
> > > > moves
> > > > > > of
> > > > > > terms in the query phrase out of position. For example, to switch
> > the
> > > > > order
> > > > > > of two words requires two moves (the first move places the words
> > atop
> > > > one
> > > > > > another), so to permit re-orderings of phrases, the slop must be
> at
> > > > least
> > > > > > two.
> > > > > >
> > > > > > More exact matches are scored higher than sloppier matches, thus
> > > search
> > > > > > results are sorted by exactness.
> > > > > >
> > > > > > The slop is zero by default, requiring exact matches.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards
> > > > > Tarun Sapra
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards
> > > Tarun Sapra
> > >
> >
>



-- 
Thanks & Regards
Tarun Sapra

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by Erick Erickson <er...@gmail.com>.
I think you're misunderstanding the intent of PhraseQueries and slop. Slop
is the number of intervening tokens that may exist between the words
you're looking for. However, all the words you're looking for MUST exist.
So,

<<< whenever the search phrase contains a word that don't
exist in the document, the search result will be empty >>>

is exactly how this is intended to work.

HTH
Erick


On Mon, Jun 28, 2010 at 9:09 AM, a peng <zh...@gmail.com> wrote:

> Hi,
>
> My test result is that whenever the search phrase contains a word that
> don't
> exist in the document, the search result will be empty no matter how big
> the
> slop factor I set, seems this is a bug of Lucene, or it is work as design?
>
> 2010/6/28 tarun sapra <t....@gmail.com>
>
> > Hi ,
> >
> > I think I have been able to understand whats happening here...
> >
> > Indexed Content : "This is a test".
> > your search phrase : "This is a formal test"
> > your setting the slop factor 2 , now if your slop factor is 3 it should
> > work
> > because "is" and "a" are stop words thus the words "This" and "test" are
> 2
> > slop factor apart but in your search phrase "This is a formal test" the
> > words "This" and "test"  are 3 slop factor thats why it's nor working
> > now in search phrase "This is formal test" the words "This" and "test"
> are
> > 2
> > slop factor apart thats why this phrase is working.
> >
> >
> >
> > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am using StandardAnalyzer(Version.LUCENE_30);
> > >
> > > 2010/6/27 tarun sapra <t....@gmail.com>
> > >
> > > > which analyzer are you usin'?
> > > >
> > > >
> > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I know the indexed content contains the following text: "This is a
> > > test".
> > > > > And the search phrase I used is "This is a formal test", and then I
> > set
> > > > the
> > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found that I
> can
> > > not
> > > > > get
> > > > > a search result. If I set the search phrase as "This is formal
> test",
> > > > then
> > > > > I
> > > > > can get the search result.
> > > > >
> > > > > So what is the problem here, thanks in advance.
> > > > >
> > > > >
> > > > > Attached is the Java doc for the setSlop method:
> > > > >
> > > > > public void *setSlop*(int s)
> > > > >
> > > > > Sets the number of other words permitted between words in query
> > phrase.
> > > > If
> > > > > zero, then this is an exact phrase search. For larger values this
> > works
> > > > > like
> > > > > a WITHIN or NEAR operator.
> > > > >
> > > > > The slop is in fact an edit-distance, where the units correspond to
> > > moves
> > > > > of
> > > > > terms in the query phrase out of position. For example, to switch
> the
> > > > order
> > > > > of two words requires two moves (the first move places the words
> atop
> > > one
> > > > > another), so to permit re-orderings of phrases, the slop must be at
> > > least
> > > > > two.
> > > > >
> > > > > More exact matches are scored higher than sloppier matches, thus
> > search
> > > > > results are sorted by exactness.
> > > > >
> > > > > The slop is zero by default, requiring exact matches.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards
> > > > Tarun Sapra
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards
> > Tarun Sapra
> >
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by a peng <zh...@gmail.com>.
Hi,

My test result is that whenever the search phrase contains a word that don't
exist in the document, the search result will be empty no matter how big the
slop factor I set, seems this is a bug of Lucene, or it is work as design?

2010/6/28 tarun sapra <t....@gmail.com>

> Hi ,
>
> I think I have been able to understand whats happening here...
>
> Indexed Content : "This is a test".
> your search phrase : "This is a formal test"
> your setting the slop factor 2 , now if your slop factor is 3 it should
> work
> because "is" and "a" are stop words thus the words "This" and "test" are 2
> slop factor apart but in your search phrase "This is a formal test" the
> words "This" and "test"  are 3 slop factor thats why it's nor working
> now in search phrase "This is formal test" the words "This" and "test" are
> 2
> slop factor apart thats why this phrase is working.
>
>
>
> On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using StandardAnalyzer(Version.LUCENE_30);
> >
> > 2010/6/27 tarun sapra <t....@gmail.com>
> >
> > > which analyzer are you usin'?
> > >
> > >
> > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I know the indexed content contains the following text: "This is a
> > test".
> > > > And the search phrase I used is "This is a formal test", and then I
> set
> > > the
> > > > slop of the PhraseQuery as 2 with setSlop(2), but I found that I can
> > not
> > > > get
> > > > a search result. If I set the search phrase as "This is formal test",
> > > then
> > > > I
> > > > can get the search result.
> > > >
> > > > So what is the problem here, thanks in advance.
> > > >
> > > >
> > > > Attached is the Java doc for the setSlop method:
> > > >
> > > > public void *setSlop*(int s)
> > > >
> > > > Sets the number of other words permitted between words in query
> phrase.
> > > If
> > > > zero, then this is an exact phrase search. For larger values this
> works
> > > > like
> > > > a WITHIN or NEAR operator.
> > > >
> > > > The slop is in fact an edit-distance, where the units correspond to
> > moves
> > > > of
> > > > terms in the query phrase out of position. For example, to switch the
> > > order
> > > > of two words requires two moves (the first move places the words atop
> > one
> > > > another), so to permit re-orderings of phrases, the slop must be at
> > least
> > > > two.
> > > >
> > > > More exact matches are scored higher than sloppier matches, thus
> search
> > > > results are sorted by exactness.
> > > >
> > > > The slop is zero by default, requiring exact matches.
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards
> > > Tarun Sapra
> > >
> >
>
>
>
> --
> Thanks & Regards
> Tarun Sapra
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by tarun sapra <t....@gmail.com>.
Hi ,

I think I have been able to understand whats happening here...

Indexed Content : "This is a test".
your search phrase : "This is a formal test"
your setting the slop factor 2 , now if your slop factor is 3 it should work
because "is" and "a" are stop words thus the words "This" and "test" are 2
slop factor apart but in your search phrase "This is a formal test" the
words "This" and "test"  are 3 slop factor thats why it's nor working
now in search phrase "This is formal test" the words "This" and "test" are 2
slop factor apart thats why this phrase is working.



On Mon, Jun 28, 2010 at 11:37 AM, a peng <zh...@gmail.com> wrote:

> Hi,
>
> I am using StandardAnalyzer(Version.LUCENE_30);
>
> 2010/6/27 tarun sapra <t....@gmail.com>
>
> > which analyzer are you usin'?
> >
> >
> > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I know the indexed content contains the following text: "This is a
> test".
> > > And the search phrase I used is "This is a formal test", and then I set
> > the
> > > slop of the PhraseQuery as 2 with setSlop(2), but I found that I can
> not
> > > get
> > > a search result. If I set the search phrase as "This is formal test",
> > then
> > > I
> > > can get the search result.
> > >
> > > So what is the problem here, thanks in advance.
> > >
> > >
> > > Attached is the Java doc for the setSlop method:
> > >
> > > public void *setSlop*(int s)
> > >
> > > Sets the number of other words permitted between words in query phrase.
> > If
> > > zero, then this is an exact phrase search. For larger values this works
> > > like
> > > a WITHIN or NEAR operator.
> > >
> > > The slop is in fact an edit-distance, where the units correspond to
> moves
> > > of
> > > terms in the query phrase out of position. For example, to switch the
> > order
> > > of two words requires two moves (the first move places the words atop
> one
> > > another), so to permit re-orderings of phrases, the slop must be at
> least
> > > two.
> > >
> > > More exact matches are scored higher than sloppier matches, thus search
> > > results are sorted by exactness.
> > >
> > > The slop is zero by default, requiring exact matches.
> > >
> >
> >
> >
> > --
> > Thanks & Regards
> > Tarun Sapra
> >
>



-- 
Thanks & Regards
Tarun Sapra

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by a peng <zh...@gmail.com>.
Hi,

I am using StandardAnalyzer(Version.LUCENE_30);

2010/6/27 tarun sapra <t....@gmail.com>

> which analyzer are you usin'?
>
>
> On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com> wrote:
>
> > Hi,
> >
> > I know the indexed content contains the following text: "This is a test".
> > And the search phrase I used is "This is a formal test", and then I set
> the
> > slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not
> > get
> > a search result. If I set the search phrase as "This is formal test",
> then
> > I
> > can get the search result.
> >
> > So what is the problem here, thanks in advance.
> >
> >
> > Attached is the Java doc for the setSlop method:
> >
> > public void *setSlop*(int s)
> >
> > Sets the number of other words permitted between words in query phrase.
> If
> > zero, then this is an exact phrase search. For larger values this works
> > like
> > a WITHIN or NEAR operator.
> >
> > The slop is in fact an edit-distance, where the units correspond to moves
> > of
> > terms in the query phrase out of position. For example, to switch the
> order
> > of two words requires two moves (the first move places the words atop one
> > another), so to permit re-orderings of phrases, the slop must be at least
> > two.
> >
> > More exact matches are scored higher than sloppier matches, thus search
> > results are sorted by exactness.
> >
> > The slop is zero by default, requiring exact matches.
> >
>
>
>
> --
> Thanks & Regards
> Tarun Sapra
>

Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)

Posted by tarun sapra <t....@gmail.com>.
which analyzer are you usin'?


On Sun, Jun 27, 2010 at 7:12 AM, a peng <zh...@gmail.com> wrote:

> Hi,
>
> I know the indexed content contains the following text: "This is a test".
> And the search phrase I used is "This is a formal test", and then I set the
> slop of the PhraseQuery as 2 with setSlop(2), but I found that I can not
> get
> a search result. If I set the search phrase as "This is formal test", then
> I
> can get the search result.
>
> So what is the problem here, thanks in advance.
>
>
> Attached is the Java doc for the setSlop method:
>
> public void *setSlop*(int s)
>
> Sets the number of other words permitted between words in query phrase. If
> zero, then this is an exact phrase search. For larger values this works
> like
> a WITHIN or NEAR operator.
>
> The slop is in fact an edit-distance, where the units correspond to moves
> of
> terms in the query phrase out of position. For example, to switch the order
> of two words requires two moves (the first move places the words atop one
> another), so to permit re-orderings of phrases, the slop must be at least
> two.
>
> More exact matches are scored higher than sloppier matches, thus search
> results are sorted by exactness.
>
> The slop is zero by default, requiring exact matches.
>



-- 
Thanks & Regards
Tarun Sapra