You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by javier muguruza <jm...@gmail.com> on 2005/12/15 11:41:26 UTC

all stop words in exact phrase get 0 hits

 Hi,

Suppose I have a query like this:
+attachments:purpose
 that returns N hits.
If I add another condition
+attachments:purpose +attachments:"hello world"
I still get some hits, but if the words in the "hello world" phrase
happen to be all stop words I get 0 hits.

I can fix that by checking at least one of them is not a stop word,
but just wanted to know wether thats expected behaviour.


Im using lucene1.9rc.
thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by javier muguruza <jm...@gmail.com>.
What Erik mentions is what I was looking for. I have not had a look at
the code yet, but it sounds like it. I'll have a look at the
AnalyzerDemo  code tomorrow.

In my case I guess I could use Chris' approach too, but it will be
easier the other one. For the record, my UI has both a 'simple' UI
with textboxes and an 'advanced' one where the user will have to use
proper queryparser syntax, if he dares :)

thanks guys,
javier

On 12/19/05, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Hoss - the main caveat with this approach is that a user could select
> a different field from any of the designated text boxes.  Putting
> "subject:foo" in the name text box for example.
>
> I think this is one of my biggest issues with QueryParser these days,
> it exposes too much power.  It's like giving a SQL text box to your
> database application (read-only, of course).  There are generally
> fields not made for user querying.
>
> Caveat asides, the technique you describe is a good one and worth
> considering when dealing with these types of situations.
>
>         Erik
>
>
>
> On Dec 19, 2005, at 2:14 PM, Chris Hostetter wrote:
>
> >
> > : > I have moved from my approach:
> > : > Query query = QueryParser.parse("big lucene expression", "afield",
> > : > LuceneHelper.getAnalyzer());
> > : >
> > : > to building the query based on BooleanQuery, PhareQuery, TermQuery
> > : > etc...But before the analyzer was doing a bunch of work in my
> > incoming
> > : > words, and I dont see an easy way to do the same sort of stuff
> > to the
> > : > incoming  words used in PhareQuery, TermQuery etc.
> >
> > In addition to Erik's suggestion, something that i find frequently
> > makes
> > sense is to use the QueryParser in each case where you are dealing
> > with a
> > discrete "input string" -- and then combine them into a BooleanQuery
> > (along with any other progromatically generated TermQueries you
> > might want
> > for other reasons)
> >
> > For example, it looks like your applciation is targeted at
> > searching email
> > right?  let's assume your application allows the user to specify the
> > following inputs
> >
> >     * a text box for search words
> >     * a text box for people's names/email
> >     * pulldowns for picking date ranges
> >
> > ...and you want to look in the subject, body, and attachment fields
> > for
> > the input words (with differnet boosts) and in all of the header
> > fields
> > for the name/email input.   you can builda Date object out of the
> > pulldown
> > yourself, and then do something like this with the two strings...
> >
> >    QueryParser subP = new QueryParser("subject",SomeAnalyzer)
> >    QueryParser bodyP = new QueryParser("body",SomeAnalyzer)
> >    QueryParser attachP = new QueryParser("attachemnts",SomeAnalyzer)
> >    QueryParser headP = new QueryParser
> > ("headers",AnalyzerThatKnowsAboutEmailAddress)
> >    Query s = subP.parse(words);
> >    s.setBoost(2.0)
> >    Query b = bodyP.parse(words);
> >    Query a = attachP.parse(words);
> >    a.setBoost(0.5)
> >    Query h = headP.parse(person);
> >    h.setBoost(1.5)
> >    BooleanQuery bq = new BooleanQuery();
> >    bq.add(s,false,false);
> >    bq.add(b,false,false);
> >    bq.add(a,false,false);
> >    bq.add(h,false,false);
> >
> > ...and then execute your search using a filter built from the date
> > range
> > pulldowns.
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by javier muguruza <jm...@gmail.com>.
actually I ended up using your approach, also escaping the input
string. With the other approach I wasnt sure wether I could just take
the text from the token and go on etc...
its working fine in my tests so far.


On 12/20/05, Chris Hostetter <ho...@fucit.org> wrote:
>
> : Hoss - the main caveat with this approach is that a user could select
> : a different field from any of the designated text boxes.  Putting
> : "subject:foo" in the name text box for example.
>
> sorry ... yea, i left out a step...
>
>      String words = QueryParser.escape(rawUserInputString)
>
> Generally speaking this approach has been working well for me in my
> current development as a bare bones method for converting raw user word
> lists into pieces of queries that i then compose into a larger more
> complex query structure.  the only hitch i've ever really run into is when
> people try to search for things like:  42" tv ... becuase query parser
> doesn't have any way to escape that quote, so it barfs thinking it's an
> unterminated phrase query.  but even if there was a way to escape it, i
> don't think i'd want to, beause as is it lets the (more common case) of
> quoted phrases do what (i) expect and become phrase queries.
>
> I still haven't completley decided what i want to do about the:   42" tv
> case ... but i'm very close to preprocessing the input and saying: "even
> number of quotes, leave them alone, odd number of quotes, yank them all
> out."
>
> : I think this is one of my biggest issues with QueryParser these days,
> : it exposes too much power.  It's like giving a SQL text box to your
> : database application (read-only, of course).  There are generally
> : fields not made for user querying.
>
> agreed.  If i don't take the easy way out on that 42" case, i think I may
> spend some time on a generic "UserQueryParser" that doesn't support any
> fancy syntax of the exsting QueryParser, and just knows how to build
> TermQuery and phrases queries on a specified field -- with options forhow
> to deal with uneven numbered quotes (looking at the white space arround
> the quotes for guidence in interpreting it)
>
> : > In addition to Erik's suggestion, something that i find frequently
> : > makes
> : > sense is to use the QueryParser in each case where you are dealing
> : > with a
> : > discrete "input string" -- and then combine them into a BooleanQuery
> : > (along with any other progromatically generated TermQueries you
> : > might want
> : > for other reasons)
> : >
> : > For example, it looks like your applciation is targeted at
> : > searching email
> : > right?  let's assume your application allows the user to specify the
> : > following inputs
> : >
> : >     * a text box for search words
> : >     * a text box for people's names/email
> : >     * pulldowns for picking date ranges
> : >
> : > ...and you want to look in the subject, body, and attachment fields
> : > for
> : > the input words (with differnet boosts) and in all of the header
> : > fields
> : > for the name/email input.   you can builda Date object out of the
> : > pulldown
> : > yourself, and then do something like this with the two strings...
> : >
> : >    QueryParser subP = new QueryParser("subject",SomeAnalyzer)
> : >    QueryParser bodyP = new QueryParser("body",SomeAnalyzer)
> : >    QueryParser attachP = new QueryParser("attachemnts",SomeAnalyzer)
> : >    QueryParser headP = new QueryParser
> : > ("headers",AnalyzerThatKnowsAboutEmailAddress)
> : >    Query s = subP.parse(words);
> : >    s.setBoost(2.0)
> : >    Query b = bodyP.parse(words);
> : >    Query a = attachP.parse(words);
> : >    a.setBoost(0.5)
> : >    Query h = headP.parse(person);
> : >    h.setBoost(1.5)
> : >    BooleanQuery bq = new BooleanQuery();
> : >    bq.add(s,false,false);
> : >    bq.add(b,false,false);
> : >    bq.add(a,false,false);
> : >    bq.add(h,false,false);
> : >
> : > ...and then execute your search using a filter built from the date
> : > range
> : > pulldowns.
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : For additional commands, e-mail: java-user-help@lucene.apache.org
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Chris Hostetter <ho...@fucit.org>.
: Hoss - the main caveat with this approach is that a user could select
: a different field from any of the designated text boxes.  Putting
: "subject:foo" in the name text box for example.

sorry ... yea, i left out a step...

     String words = QueryParser.escape(rawUserInputString)

Generally speaking this approach has been working well for me in my
current development as a bare bones method for converting raw user word
lists into pieces of queries that i then compose into a larger more
complex query structure.  the only hitch i've ever really run into is when
people try to search for things like:  42" tv ... becuase query parser
doesn't have any way to escape that quote, so it barfs thinking it's an
unterminated phrase query.  but even if there was a way to escape it, i
don't think i'd want to, beause as is it lets the (more common case) of
quoted phrases do what (i) expect and become phrase queries.

I still haven't completley decided what i want to do about the:   42" tv
case ... but i'm very close to preprocessing the input and saying: "even
number of quotes, leave them alone, odd number of quotes, yank them all
out."

: I think this is one of my biggest issues with QueryParser these days,
: it exposes too much power.  It's like giving a SQL text box to your
: database application (read-only, of course).  There are generally
: fields not made for user querying.

agreed.  If i don't take the easy way out on that 42" case, i think I may
spend some time on a generic "UserQueryParser" that doesn't support any
fancy syntax of the exsting QueryParser, and just knows how to build
TermQuery and phrases queries on a specified field -- with options forhow
to deal with uneven numbered quotes (looking at the white space arround
the quotes for guidence in interpreting it)

: > In addition to Erik's suggestion, something that i find frequently
: > makes
: > sense is to use the QueryParser in each case where you are dealing
: > with a
: > discrete "input string" -- and then combine them into a BooleanQuery
: > (along with any other progromatically generated TermQueries you
: > might want
: > for other reasons)
: >
: > For example, it looks like your applciation is targeted at
: > searching email
: > right?  let's assume your application allows the user to specify the
: > following inputs
: >
: >     * a text box for search words
: >     * a text box for people's names/email
: >     * pulldowns for picking date ranges
: >
: > ...and you want to look in the subject, body, and attachment fields
: > for
: > the input words (with differnet boosts) and in all of the header
: > fields
: > for the name/email input.   you can builda Date object out of the
: > pulldown
: > yourself, and then do something like this with the two strings...
: >
: >    QueryParser subP = new QueryParser("subject",SomeAnalyzer)
: >    QueryParser bodyP = new QueryParser("body",SomeAnalyzer)
: >    QueryParser attachP = new QueryParser("attachemnts",SomeAnalyzer)
: >    QueryParser headP = new QueryParser
: > ("headers",AnalyzerThatKnowsAboutEmailAddress)
: >    Query s = subP.parse(words);
: >    s.setBoost(2.0)
: >    Query b = bodyP.parse(words);
: >    Query a = attachP.parse(words);
: >    a.setBoost(0.5)
: >    Query h = headP.parse(person);
: >    h.setBoost(1.5)
: >    BooleanQuery bq = new BooleanQuery();
: >    bq.add(s,false,false);
: >    bq.add(b,false,false);
: >    bq.add(a,false,false);
: >    bq.add(h,false,false);
: >
: > ...and then execute your search using a filter built from the date
: > range
: > pulldowns.
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Hoss - the main caveat with this approach is that a user could select  
a different field from any of the designated text boxes.  Putting  
"subject:foo" in the name text box for example.

I think this is one of my biggest issues with QueryParser these days,  
it exposes too much power.  It's like giving a SQL text box to your  
database application (read-only, of course).  There are generally  
fields not made for user querying.

Caveat asides, the technique you describe is a good one and worth  
considering when dealing with these types of situations.

	Erik



On Dec 19, 2005, at 2:14 PM, Chris Hostetter wrote:

>
> : > I have moved from my approach:
> : > Query query = QueryParser.parse("big lucene expression", "afield",
> : > LuceneHelper.getAnalyzer());
> : >
> : > to building the query based on BooleanQuery, PhareQuery, TermQuery
> : > etc...But before the analyzer was doing a bunch of work in my  
> incoming
> : > words, and I dont see an easy way to do the same sort of stuff  
> to the
> : > incoming  words used in PhareQuery, TermQuery etc.
>
> In addition to Erik's suggestion, something that i find frequently  
> makes
> sense is to use the QueryParser in each case where you are dealing  
> with a
> discrete "input string" -- and then combine them into a BooleanQuery
> (along with any other progromatically generated TermQueries you  
> might want
> for other reasons)
>
> For example, it looks like your applciation is targeted at  
> searching email
> right?  let's assume your application allows the user to specify the
> following inputs
>
>     * a text box for search words
>     * a text box for people's names/email
>     * pulldowns for picking date ranges
>
> ...and you want to look in the subject, body, and attachment fields  
> for
> the input words (with differnet boosts) and in all of the header  
> fields
> for the name/email input.   you can builda Date object out of the  
> pulldown
> yourself, and then do something like this with the two strings...
>
>    QueryParser subP = new QueryParser("subject",SomeAnalyzer)
>    QueryParser bodyP = new QueryParser("body",SomeAnalyzer)
>    QueryParser attachP = new QueryParser("attachemnts",SomeAnalyzer)
>    QueryParser headP = new QueryParser 
> ("headers",AnalyzerThatKnowsAboutEmailAddress)
>    Query s = subP.parse(words);
>    s.setBoost(2.0)
>    Query b = bodyP.parse(words);
>    Query a = attachP.parse(words);
>    a.setBoost(0.5)
>    Query h = headP.parse(person);
>    h.setBoost(1.5)
>    BooleanQuery bq = new BooleanQuery();
>    bq.add(s,false,false);
>    bq.add(b,false,false);
>    bq.add(a,false,false);
>    bq.add(h,false,false);
>
> ...and then execute your search using a filter built from the date  
> range
> pulldowns.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Chris Hostetter <ho...@fucit.org>.
: > I have moved from my approach:
: > Query query = QueryParser.parse("big lucene expression", "afield",
: > LuceneHelper.getAnalyzer());
: >
: > to building the query based on BooleanQuery, PhareQuery, TermQuery
: > etc...But before the analyzer was doing a bunch of work in my incoming
: > words, and I dont see an easy way to do the same sort of stuff to the
: > incoming  words used in PhareQuery, TermQuery etc.

In addition to Erik's suggestion, something that i find frequently makes
sense is to use the QueryParser in each case where you are dealing with a
discrete "input string" -- and then combine them into a BooleanQuery
(along with any other progromatically generated TermQueries you might want
for other reasons)

For example, it looks like your applciation is targeted at searching email
right?  let's assume your application allows the user to specify the
following inputs

    * a text box for search words
    * a text box for people's names/email
    * pulldowns for picking date ranges

...and you want to look in the subject, body, and attachment fields for
the input words (with differnet boosts) and in all of the header fields
for the name/email input.   you can builda Date object out of the pulldown
yourself, and then do something like this with the two strings...

   QueryParser subP = new QueryParser("subject",SomeAnalyzer)
   QueryParser bodyP = new QueryParser("body",SomeAnalyzer)
   QueryParser attachP = new QueryParser("attachemnts",SomeAnalyzer)
   QueryParser headP = new QueryParser("headers",AnalyzerThatKnowsAboutEmailAddress)
   Query s = subP.parse(words);
   s.setBoost(2.0)
   Query b = bodyP.parse(words);
   Query a = attachP.parse(words);
   a.setBoost(0.5)
   Query h = headP.parse(person);
   h.setBoost(1.5)
   BooleanQuery bq = new BooleanQuery();
   bq.add(s,false,false);
   bq.add(b,false,false);
   bq.add(a,false,false);
   bq.add(h,false,false);

...and then execute your search using a filter built from the date range
pulldowns.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 19, 2005, at 11:30 AM, javier muguruza wrote:

> I have moved from my approach:
> Query query = QueryParser.parse("big lucene expression", "afield",
> LuceneHelper.getAnalyzer());
>
> to building the query based on BooleanQuery, PhareQuery, TermQuery
> etc...But before the analyzer was doing a bunch of work in my incoming
> words, and I dont see an easy way to do the same sort of stuff to the
> incoming  words used in PhareQuery, TermQuery etc.
>
> Isnt there such a  way I miss to see? Any example somewhere (LIA etc)

Look at the AnalyzerDemo code.  There isn't anything built in, as  
doing it only involves a few lines of code.  Run a String (via  
StringReader) through an Analyzer, pluck out the tokens from the  
resultant TokenStream, create the queries you want from those  
tokens.   I use this technique frequently to bypass the  
idiosyncrasies of QueryParser or in places where using the QP syntax  
is undesirable but analysis still needs to occur.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by javier muguruza <jm...@gmail.com>.
I have moved from my approach:
Query query = QueryParser.parse("big lucene expression", "afield",
LuceneHelper.getAnalyzer());

to building the query based on BooleanQuery, PhareQuery, TermQuery
etc...But before the analyzer was doing a bunch of work in my incoming
words, and I dont see an easy way to do the same sort of stuff to the
incoming  words used in PhareQuery, TermQuery etc.

Isnt there such a  way I miss to see? Any example somewhere (LIA etc)

thanks

On 12/17/05, Yonik Seeley <ys...@gmail.com> wrote:
> Ah, sorry.
>
> Still, this doesn't seem like the most desirable behavior.
> It would be nice if we could fine a way to fix it.
>
> -Yonik
>
> On 12/16/05, javier muguruza <jm...@gmail.com> wrote:
> > Yonik,
> >
> > this was due to the additional parenthses I was using, see my last
> > email. I think I'll rewrite with the lucene api as Erik said.
> >
> > thanks,
> > javi
> >
> >
> > On 12/16/05, Yonik Seeley <ys...@gmail.com> wrote:
> > > I can't reproduce this behavior with the current version of Lucene.
> > >
> > > +text:solar  => 112 docs
> > > +text:"a a a" => 0 docs because a is a stop word
> > > +text"solar" +text:"a a a" => 112 docs
> > >
> > > -Yonik
> > >
> > >
> > > On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
> > > >  Hi,
> > > >
> > > > Suppose I have a query like this:
> > > > +attachments:purpose
> > > >  that returns N hits.
> > > > If I add another condition
> > > > +attachments:purpose +attachments:"hello world"
> > > > I still get some hits, but if the words in the "hello world" phrase
> > > > happen to be all stop words I get 0 hits.
> > > >
> > > > I can fix that by checking at least one of them is not a stop word,
> > > > but just wanted to know wether thats expected behaviour.
> > > >
> > > >
> > > > Im using lucene1.9rc.
> > > > thanks
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Yonik Seeley <ys...@gmail.com>.
Ah, sorry.

Still, this doesn't seem like the most desirable behavior.
It would be nice if we could fine a way to fix it.

-Yonik

On 12/16/05, javier muguruza <jm...@gmail.com> wrote:
> Yonik,
>
> this was due to the additional parenthses I was using, see my last
> email. I think I'll rewrite with the lucene api as Erik said.
>
> thanks,
> javi
>
>
> On 12/16/05, Yonik Seeley <ys...@gmail.com> wrote:
> > I can't reproduce this behavior with the current version of Lucene.
> >
> > +text:solar  => 112 docs
> > +text:"a a a" => 0 docs because a is a stop word
> > +text"solar" +text:"a a a" => 112 docs
> >
> > -Yonik
> >
> >
> > On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
> > >  Hi,
> > >
> > > Suppose I have a query like this:
> > > +attachments:purpose
> > >  that returns N hits.
> > > If I add another condition
> > > +attachments:purpose +attachments:"hello world"
> > > I still get some hits, but if the words in the "hello world" phrase
> > > happen to be all stop words I get 0 hits.
> > >
> > > I can fix that by checking at least one of them is not a stop word,
> > > but just wanted to know wether thats expected behaviour.
> > >
> > >
> > > Im using lucene1.9rc.
> > > thanks
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by javier muguruza <jm...@gmail.com>.
Yonik,

this was due to the additional parenthses I was using, see my last
email. I think I'll rewrite with the lucene api as Erik said.

thanks,
javi


On 12/16/05, Yonik Seeley <ys...@gmail.com> wrote:
> I can't reproduce this behavior with the current version of Lucene.
>
> +text:solar  => 112 docs
> +text:"a a a" => 0 docs because a is a stop word
> +text"solar" +text:"a a a" => 112 docs
>
> -Yonik
>
>
> On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
> >  Hi,
> >
> > Suppose I have a query like this:
> > +attachments:purpose
> >  that returns N hits.
> > If I add another condition
> > +attachments:purpose +attachments:"hello world"
> > I still get some hits, but if the words in the "hello world" phrase
> > happen to be all stop words I get 0 hits.
> >
> > I can fix that by checking at least one of them is not a stop word,
> > but just wanted to know wether thats expected behaviour.
> >
> >
> > Im using lucene1.9rc.
> > thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Yonik Seeley <ys...@gmail.com>.
I can't reproduce this behavior with the current version of Lucene.

+text:solar  => 112 docs
+text:"a a a" => 0 docs because a is a stop word
+text"solar" +text:"a a a" => 112 docs

-Yonik


On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
>  Hi,
>
> Suppose I have a query like this:
> +attachments:purpose
>  that returns N hits.
> If I add another condition
> +attachments:purpose +attachments:"hello world"
> I still get some hits, but if the words in the "hello world" phrase
> happen to be all stop words I get 0 hits.
>
> I can fix that by checking at least one of them is not a stop word,
> but just wanted to know wether thats expected behaviour.
>
>
> Im using lucene1.9rc.
> thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 15, 2005, at 10:34 AM, javier muguruza wrote:

> thanks all,
>
> Yes, I know + means it must be true, but the phrase goes through the
> same analyzer, so stop words are removed....
>
> I made some debugging and I got this:
> Query query = QueryParser.parse(searchexp, "body",  
> LuceneHelper.getAnalyzer());
>
> If searchexp is
> ((+(body:"I have")) OR (+(attachments:"I have")))
> query .toString() is
> (+()) (+())
>
> but if searchexp is
> +body:"I have" OR +attachments:"I have"
> query .toString() is blank
>
> so that makes it work differently I think. I have lots of parenthesis
> cause I build the string by code from multiple UI boxes etc, to make
> sure ands and ors are properly handled.

If you're building up a Query from a UI, I strongly recommend you do  
so using the API rather than QueryParser.  Pieces of the expression  
may make sense to be parsed, or just analyzed of course, but  
QueryParser adds complexity to the equation.  Using BooleanQuery with  
nested PhraseQuery's and such directly will provide a much cleaner  
Query and much less trouble.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by javier muguruza <jm...@gmail.com>.
thanks all,

Yes, I know + means it must be true, but the phrase goes through the
same analyzer, so stop words are removed....

I made some debugging and I got this:
Query query = QueryParser.parse(searchexp, "body", LuceneHelper.getAnalyzer());

If searchexp is
((+(body:"I have")) OR (+(attachments:"I have")))
query .toString() is
(+()) (+())

but if searchexp is
+body:"I have" OR +attachments:"I have"
query .toString() is blank

so that makes it work differently I think. I have lots of parenthesis
cause I build the string by code from multiple UI boxes etc, to make
sure ands and ors are properly handled.

As i said i can overcome that by code, but I think that is a bug no?
it shoudl realize (+()) (+()) its just parenthesis...

javier
On 12/15/05, Yonik Seeley <ys...@gmail.com> wrote:
> Are you using the same Analyzer for both indexing and querying (or the
> same StopFilter at least)?
>
> -Yonik
>
> On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
> >  Hi,
> >
> > Suppose I have a query like this:
> > +attachments:purpose
> >  that returns N hits.
> > If I add another condition
> > +attachments:purpose +attachments:"hello world"
> > I still get some hits, but if the words in the "hello world" phrase
> > happen to be all stop words I get 0 hits.
> >
> > I can fix that by checking at least one of them is not a stop word,
> > but just wanted to know wether thats expected behaviour.
> >
> >
> > Im using lucene1.9rc.
> > thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Yonik Seeley <ys...@gmail.com>.
Are you using the same Analyzer for both indexing and querying (or the
same StopFilter at least)?

-Yonik

On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
>  Hi,
>
> Suppose I have a query like this:
> +attachments:purpose
>  that returns N hits.
> If I add another condition
> +attachments:purpose +attachments:"hello world"
> I still get some hits, but if the words in the "hello world" phrase
> happen to be all stop words I get 0 hits.
>
> I can fix that by checking at least one of them is not a stop word,
> but just wanted to know wether thats expected behaviour.
>
>
> Im using lucene1.9rc.
> thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: all stop words in exact phrase get 0 hits

Posted by Dan Funk <da...@danandtamara.com>.
That is certainly the behaviour I would expect. The "+" means the term or
phrase is required - you are requiring words that are not stored in your
index.

Why don't remove the "+"?  Alternately you could run the search, and if no
matches are found, run it again without the second argument.  I've found
lucene to be fairly fast at finding 0 results.


On 12/15/05, javier muguruza <jm...@gmail.com> wrote:
>
> Hi,
>
> Suppose I have a query like this:
> +attachments:purpose
> that returns N hits.
> If I add another condition
> +attachments:purpose +attachments:"hello world"
> I still get some hits, but if the words in the "hello world" phrase
> happen to be all stop words I get 0 hits.
>
> I can fix that by checking at least one of them is not a stop word,
> but just wanted to know wether thats expected behaviour.
>
>
> Im using lucene1.9rc.
> thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>