You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Nenad Maricic <ne...@gmail.com> on 2010/04/28 18:43:01 UTC

New user: multi fields query question

Hi, 

Hope there is solution for this: 

We have an index with following fields in the document: title, keyword, and
category. 

We are using our, custom similarity class, with all factors set to 1.0 (tf,
idf, lengthNorm) except coord. Also, we have boosted "keyword" field during
indexing. 

We are querying multiple fields using BooleanQuery. The problem is that for
query like this

title:X  OR keyword:X  OR category:X

and doc: 

title: X Y Z

keyword: X Y Z

category: X Y Z

we get 3 matches for the word X 

The result we want to get is one match per doc for the word, regardless of
the field it is found in (even in the case this word is contained in more
than one field). 

The final goal is to have documents with more matches to get higher score,
and inside every group of the documents with the same score to have
documents with more matches in the "keyword" field on the top.  

We heard that is such case DisjunctionMaxQuery should be used, but don't
know exactly how to use it. 

The Lucene.NET version we are using is 2.4. 

 

Regards, 

Nenad

 

 

 

 

 


RE: New user: multi fields query question

Posted by Nenad Maričić <ne...@gmail.com>.
Maybe my question wasn't clear enough. I know that I should have more than
one indexed doc (we have millions of db records indexed). What I want to get
when the following query is executed : 

title:X OR keyword:X OR category: X

on the documents: 

title= X y z; keyword= X z; category: X y; 
title= X y z; keyword= X z; category: y;
title= X y z; keyword= z; category: y;

is that all documents has the same coord factor (1/3). 

Regularly, first doc would have coord 3/3, second 2/3, and third 1/3. 

For the users of our app it is not important how many times some term is
contained in the doc or in how many fields it is found, only thing that
matters is if doc contains term or not. 

Thanks for the reply

Nenad


-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: Wednesday, April 28, 2010 6:59 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: New user: multi fields query question

Hi Nenad,
To get 3 matches, you should have indexed 3 documents(either same docs. in
the index or 3 docs each having one field). Otherwise it should have worked
as you expected.
For the second part, other that boosting "keyword", I don't think that you
have to do something extra.

DIGY

-----Original Message-----
From: Nenad Maricic [mailto:nenad.maricic@gmail.com] 
Sent: Wednesday, April 28, 2010 7:43 PM
To: lucene-net-user@lucene.apache.org
Subject: New user: multi fields query question

Hi, 

Hope there is solution for this: 

We have an index with following fields in the document: title, keyword, and
category. 

We are using our, custom similarity class, with all factors set to 1.0 (tf,
idf, lengthNorm) except coord. Also, we have boosted "keyword" field during
indexing. 

We are querying multiple fields using BooleanQuery. The problem is that for
query like this

title:X  OR keyword:X  OR category:X

and doc: 

title: X Y Z

keyword: X Y Z

category: X Y Z

we get 3 matches for the word X 

The result we want to get is one match per doc for the word, regardless of
the field it is found in (even in the case this word is contained in more
than one field). 

The final goal is to have documents with more matches to get higher score,
and inside every group of the documents with the same score to have
documents with more matches in the "keyword" field on the top.  

We heard that is such case DisjunctionMaxQuery should be used, but don't
know exactly how to use it. 

The Lucene.NET version we are using is 2.4. 

 

Regards, 

Nenad

 

 

 

 

 




RE: New user: multi fields query question

Posted by Digy <di...@gmail.com>.
Hi Nenad,
To get 3 matches, you should have indexed 3 documents(either same docs. in
the index or 3 docs each having one field). Otherwise it should have worked
as you expected.
For the second part, other that boosting "keyword", I don't think that you
have to do something extra.

DIGY

-----Original Message-----
From: Nenad Maricic [mailto:nenad.maricic@gmail.com] 
Sent: Wednesday, April 28, 2010 7:43 PM
To: lucene-net-user@lucene.apache.org
Subject: New user: multi fields query question

Hi, 

Hope there is solution for this: 

We have an index with following fields in the document: title, keyword, and
category. 

We are using our, custom similarity class, with all factors set to 1.0 (tf,
idf, lengthNorm) except coord. Also, we have boosted "keyword" field during
indexing. 

We are querying multiple fields using BooleanQuery. The problem is that for
query like this

title:X  OR keyword:X  OR category:X

and doc: 

title: X Y Z

keyword: X Y Z

category: X Y Z

we get 3 matches for the word X 

The result we want to get is one match per doc for the word, regardless of
the field it is found in (even in the case this word is contained in more
than one field). 

The final goal is to have documents with more matches to get higher score,
and inside every group of the documents with the same score to have
documents with more matches in the "keyword" field on the top.  

We heard that is such case DisjunctionMaxQuery should be used, but don't
know exactly how to use it. 

The Lucene.NET version we are using is 2.4. 

 

Regards, 

Nenad