You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Li Li <fa...@gmail.com> on 2010/06/01 16:07:36 UTC

how to extend Similarity in this situation?

I want to only support boolean or query(as many search engine do). But
I want to boost document whose terms are closer.
e.g. the query terms are 'apache lucene'
doc1 apache has many projects such as lucene
doc2 The Apache HTTP Server Project is an effort to develop and
maintain an ...  Lucene is a .....
I want to give doc1 more boost.
So I need a place where I can get all position information
How to extend Similarity to achieve this?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to extend Similarity in this situation?

Posted by Li Li <fa...@gmail.com>.
thank you

2010/6/2 Rebecca Watson <be...@gmail.com>:
> Hi Li Li
>
> If you want to support some query types and not others you should
> overide/extend the queryparser so that you throw an exception / makes
> a different query type instead.
>
> Similarity doesn't do the actual scoring, it's used by the Query
> classes (actually the Scorer implementation used by the Query)
> to get tf / idf scores. So scoring is done by the Query types themselves
> (the Weight/Scorer classes used by the Query). So sometimes
> you change/extend Similarity to affect scoring and sometimes you
> need to change/extend the Query/Scorer/Weight classes etc.
> Luckily though, there are already some Query types that boost
> scores if the terms are closer -- the SpanQuery query types do this.
>
> You will either have to write your own queryparser (to convert query string
> to the Query object(s) required,
> or you could use e.g. the Surround queryparser supports specifying
> SpanQuery queries via user-entered text:
> http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/
>
> look in the lucene contrib/surround package for the surround queryparser.
>
> I've messed about with using SpanQuery classes a bit and people report
> that they are slower - which is true because they use a lot of extra
> info to perform their matching/scoring -- but they are still easily
> sub-second over most
> queries / reasonable sized doc collection
> -- and they only get a lot slower if you have terms that occur very frequently
> in the document collection you are searching.
> So if you are removing stop words etc you shouldn't have too
> many problems efficiency wise.
>
> hope that helps,
>
> bec :)
>
> On 1 June 2010 22:07, Li Li <fa...@gmail.com> wrote:
>> I want to only support boolean or query(as many search engine do). But
>> I want to boost document whose terms are closer.
>> e.g. the query terms are 'apache lucene'
>> doc1 apache has many projects such as lucene
>> doc2 The Apache HTTP Server Project is an effort to develop and
>> maintain an ...  Lucene is a .....
>> I want to give doc1 more boost.
>> So I need a place where I can get all position information
>> How to extend Similarity to achieve this?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to extend Similarity in this situation?

Posted by Rebecca Watson <be...@gmail.com>.
Hi Li Li

If you want to support some query types and not others you should
overide/extend the queryparser so that you throw an exception / makes
a different query type instead.

Similarity doesn't do the actual scoring, it's used by the Query
classes (actually the Scorer implementation used by the Query)
to get tf / idf scores. So scoring is done by the Query types themselves
(the Weight/Scorer classes used by the Query). So sometimes
you change/extend Similarity to affect scoring and sometimes you
need to change/extend the Query/Scorer/Weight classes etc.
Luckily though, there are already some Query types that boost
scores if the terms are closer -- the SpanQuery query types do this.

You will either have to write your own queryparser (to convert query string
to the Query object(s) required,
or you could use e.g. the Surround queryparser supports specifying
SpanQuery queries via user-entered text:
http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/

look in the lucene contrib/surround package for the surround queryparser.

I've messed about with using SpanQuery classes a bit and people report
that they are slower - which is true because they use a lot of extra
info to perform their matching/scoring -- but they are still easily
sub-second over most
queries / reasonable sized doc collection
-- and they only get a lot slower if you have terms that occur very frequently
in the document collection you are searching.
So if you are removing stop words etc you shouldn't have too
many problems efficiency wise.

hope that helps,

bec :)

On 1 June 2010 22:07, Li Li <fa...@gmail.com> wrote:
> I want to only support boolean or query(as many search engine do). But
> I want to boost document whose terms are closer.
> e.g. the query terms are 'apache lucene'
> doc1 apache has many projects such as lucene
> doc2 The Apache HTTP Server Project is an effort to develop and
> maintain an ...  Lucene is a .....
> I want to give doc1 more boost.
> So I need a place where I can get all position information
> How to extend Similarity to achieve this?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org