You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by goran kent <go...@gmail.com> on 2011/10/23 08:17:40 UTC

[lucy-user] Change default Boolean operator from OR to AND (via default_boolop)

Hi,

This is not directly related to Lucy per se, but a good place to ask:

In my experience with search engines in general, I've noticed that a
search such as

site:test.com bob

will restrict the results to pages from test.com (the + seems to be
implied on the site field, so the search seems to become:
+site:test.com bob).

In fact, google seems to go further.  My tests show that google
changes the above query to:

+site:test.com +bob

I'm trying to mimic the expected behaviour as closely as possible so
as not to frustrate/alienate my users.

So, fiddle with the query terms behind the scenes and transform them
to +site:test.com +bob...  an idea which doesn't feel right.

...or, change the default QueryParser behaviour from OR to AND:

my $query_parser = Lucy::Search::QueryParser->new(
    schema => $schema,
    default_boolop => 'AND',
);

I have a feeling that google is defaulting to AND for most cases.

Comments, experience, gut-feel?

Thanks

Re: [lucy-user] Change default Boolean operator from OR to AND (via default_boolop)

Posted by goran kent <go...@gmail.com>.
Thanks for the comments Nate!

The easiest turns out to be default_boolop => 'AND'.

It also most closely "resembles" the google search results in my tests... :)

...re Backrub, I feel your pain.  Not only does google appear to be
flipflopping like a fish on the deck w.r.t. consistency, but their UI
is also all over the gd place with popups and flashy icons and shit.
yes, some of it can be turned off, but for suck fakes, they're losing
me...

They're beginning to feel like a needy ex-girlfriend who's all over
you like a clingy wet blanket.

On 10/24/11, Nathan Kurz <na...@verse.com> wrote:
> On Sat, Oct 22, 2011 at 11:17 PM, goran kent <go...@gmail.com> wrote:
>> In fact, google seems to go further.  My tests show that google
>> changes the above query to:
>>
>> +site:test.com +bob
>
> You're so last week about this, Goran! :)
>
> Since we last wrote, Google has since changed their behaviour to
> disallow '+required' in queries and now gives an error message telling
> you to use "required" with double quotes:
> https://news.ycombinator.com/item?id=3140797
>
> That aside, I think it actually converted to ~bob, which I think still
> works.
>
>> I'm trying to mimic the expected behaviour as closely as possible so
>> as not to frustrate/alienate my users.
>
> Seeing as Google has changed their long standing behaviour several
> times (from all words required, to stemming allowed, to synonyms by
> default, to making it quite hard to "get" "exact" "results") I
> wouldn't worry too much about it.   Normal users don't ever use
> special features, and even quoted phrases are only used by a tiny
> minority.  Heck, I'm sure there a some users who never use multiple
> terms.
>
> Advanced users want it to work correctly, and don't really care what
> Google is currently doing.
>
>> So, fiddle with the query terms behind the scenes and transform them
>> to +site:test.com +bob...  an idea which doesn't feel right.
>
> Certainly the easiest approach.  Maybe do this until you can test it
> with real users?
>
>> ...or, change the default QueryParser behaviour from OR to AND:
>>
>> my $query_parser = Lucy::Search::QueryParser->new(
>>    schema => $schema,
>>    default_boolop => 'AND',
>> );
>>
>> I have a feeling that google is defaulting to AND for most cases.
>
> They used to, back when they catered to experienced users.  Currently,
> they are a lot more free-form.  For the first time since Backrub, I'm
> actively searching for a new search engine.
>
> But like Marvin says, do what's right for your data set and your
> users.  Personally, I'm a firm believer in AND, and that all non-exact
> matches should be clearly marked as such.
>
> --nate
>

Re: [lucy-user] Change default Boolean operator from OR to AND (via default_boolop)

Posted by Nathan Kurz <na...@verse.com>.
On Sat, Oct 22, 2011 at 11:17 PM, goran kent <go...@gmail.com> wrote:
> In fact, google seems to go further.  My tests show that google
> changes the above query to:
>
> +site:test.com +bob

You're so last week about this, Goran! :)

Since we last wrote, Google has since changed their behaviour to
disallow '+required' in queries and now gives an error message telling
you to use "required" with double quotes:
https://news.ycombinator.com/item?id=3140797

That aside, I think it actually converted to ~bob, which I think still works.

> I'm trying to mimic the expected behaviour as closely as possible so
> as not to frustrate/alienate my users.

Seeing as Google has changed their long standing behaviour several
times (from all words required, to stemming allowed, to synonyms by
default, to making it quite hard to "get" "exact" "results") I
wouldn't worry too much about it.   Normal users don't ever use
special features, and even quoted phrases are only used by a tiny
minority.  Heck, I'm sure there a some users who never use multiple
terms.

Advanced users want it to work correctly, and don't really care what
Google is currently doing.

> So, fiddle with the query terms behind the scenes and transform them
> to +site:test.com +bob...  an idea which doesn't feel right.

Certainly the easiest approach.  Maybe do this until you can test it
with real users?

> ...or, change the default QueryParser behaviour from OR to AND:
>
> my $query_parser = Lucy::Search::QueryParser->new(
>    schema => $schema,
>    default_boolop => 'AND',
> );
>
> I have a feeling that google is defaulting to AND for most cases.

They used to, back when they catered to experienced users.  Currently,
they are a lot more free-form.  For the first time since Backrub, I'm
actively searching for a new search engine.

But like Marvin says, do what's right for your data set and your
users.  Personally, I'm a firm believer in AND, and that all non-exact
matches should be clearly marked as such.

--nate

Re: [lucy-user] Change default Boolean operator from OR to AND (via default_boolop)

Posted by goran kent <go...@gmail.com>.
On Sun, Oct 23, 2011 at 9:14 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Sun, Oct 23, 2011 at 08:17:40AM +0200, goran kent wrote:
>> Comments, experience, gut-feel?
>
> OR is better for small corpuses.  AND is better for large corpuses.
>
> I suspect you're well into AND territory.

I suspect you're on the money, Marvin.  It's useful to have it confirmed.

Re: [lucy-user] Change default Boolean operator from OR to AND (via default_boolop)

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Oct 23, 2011 at 08:17:40AM +0200, goran kent wrote:
> Comments, experience, gut-feel?

OR is better for small corpuses.  AND is better for large corpuses.

I suspect you're well into AND territory.

Marvin Humphrey