You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by csantos <cl...@gmail.com> on 2008/12/29 15:07:24 UTC

Lucene retrieval model

Hello,

I would like to know more about Lucene's retrieval model, more specifically
about the boolean model part, is that a standard model (just documents that
match the boolean expression) or an extended model (include in the search
result all Documents which correspond to the given conditions, regardless of
the boolean connectors - AND, OR, NOT) ?

In the Apache Lucene - Scoring's page i found not that much about: 
"Lucene scoring uses a combination of the Vector Space Model (VSM) of
Information Retrieval and the Boolean model to determine how relevant a
given Document is to a User's query. In general, the idea behind the VSM is
the more times a query term appears in a document relative to the number of
times the term appears in all the documents in the collection, the more
relevant that document is to the query. It uses the Boolean model to first
narrow down the documents that need to be scored based on the use of boolean
logic in the Query specification. Lucene also adds some capabilities and
refinements onto this model to support boolean and fuzzy searching, but it
essentially remains a VSM based system at the heart."

Thanks in advance for any responses

-- 
View this message in context: http://www.nabble.com/Lucene-retrieval-model-tp21203662p21203662.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: Lucene retrieval model

Posted by Claudia Santos <cl...@gmail.com>.
Hello,

Thnks for the tip.
The idea of extended boolean model is that a weight between 0 and 1 would be 
calculated for all search results that contains at least one of the terms. 
The extended model evaluates documents with only one of the terms with a 
smaller value than one that contains both. A NOT B would have value 0.
 regards,

----- Original Message ----- 
From: "Steven A Rowe" <sa...@syr.edu>
To: <ge...@lucene.apache.org>
Sent: Monday, December 29, 2008 8:35 PM
Subject: RE: Lucene retrieval model


Hi csantos,

Very few people are subscribed to the general@lucene.apache.org mailing 
list - you'll get much better response if you use the java-user@l.a.o list 
instead.

On 12/29/2008 at 9:07 AM, csantos wrote:
> I would like to know more about Lucene's retrieval model,
> more specifically about the boolean model part, is that a
> standard model (just documents that match the boolean
> expression) or an extended model (include in the search
> result all Documents which correspond to the given
> conditions, regardless of the boolean connectors - AND,
> OR, NOT) ?

I'm not familiar with your use of the terms "standard model" and "extended 
model", so take my response here with a grain of salt.

There is no way I know of to include documents in the search results that 
violate the constraints represented by the connectors you use.  But if 
you're interested in getting all documents that match a query, can't you 
simply use all OR connectors?

Out of curiosity, how useful would it be for the query "A NOT B" to return 
documents that match "B"?

Steve 


RE: Lucene retrieval model

Posted by Steven A Rowe <sa...@syr.edu>.
Hi csantos,

Very few people are subscribed to the general@lucene.apache.org mailing list - you'll get much better response if you use the java-user@l.a.o list instead.

On 12/29/2008 at 9:07 AM, csantos wrote:
> I would like to know more about Lucene's retrieval model,
> more specifically about the boolean model part, is that a
> standard model (just documents that match the boolean
> expression) or an extended model (include in the search
> result all Documents which correspond to the given
> conditions, regardless of the boolean connectors - AND,
> OR, NOT) ?

I'm not familiar with your use of the terms "standard model" and "extended model", so take my response here with a grain of salt.

There is no way I know of to include documents in the search results that violate the constraints represented by the connectors you use.  But if you're interested in getting all documents that match a query, can't you simply use all OR connectors?

Out of curiosity, how useful would it be for the query "A NOT B" to return documents that match "B"?

Steve