You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Sizemore <Ch...@bbc.co.uk> on 2007/10/21 17:21:34 UTC

MoreLikeThis across multiple fields question...

hello--

i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my code:

FileReader reader = new FileReader("/Users/onpause/aTest.txt");
IndexReader ir = IndexReader.open(index);
IndexSearcher is = new IndexSearcher(index);
MoreLikeThis mlt = new MoreLikeThis(ir);
String [] mltFields = new String[] {"conceptSearch","abstract","contents","conceptDisplay"};
mlt.setFieldNames(mltFields);
Analyzer analyzer = new StandardAnalyzer();
mlt.setAnalyzer(analyzer);
Query mltQuery = mlt.like(reader);
Hits hits = is.search(mltQuery);

System.out.println("query: "+ mltQuery.toString() + "   ; fields: " + mlt.getFieldNames().toString()  + " ::: params: "+mlt.describeParams());



here's what that last println prints out:


query: contents:rainbow contents:edinburgh contents:actress contents:festival contents:end   ;

fields: [Ljava.lang.String;@199939 ::: 

params: maxQueryTerms  : 25
        minWordLen     : 0
        maxWordLen     : 0
        fieldNames     : "conceptSearch, abstract, contents, conceptDisplay
        boost          : false
        minTermFreq    : 1
        minDocFreq     : 1


why does the query only contain references to the field "contents"? i expected it to contain references to all 4 fields i assigned, something like:

(contents:rainbow contents:edinburgh contents:actress contents:festival contents:end) (abstract:rainbow... etc etc

when the field "contents" is in the list, it ends up being the only one visibly used in the printed query -- the others seem suppressed, at least from the print...

i really want to make sure i'm MLT-ing across all 4 fields, because i indexed then with different boosts, hoping this would influence the MLT result scoring...

also, and i'm sure this is because i'm such i naive java user, how can i convert this object reference [Ljava.lang.String;@199939  to something more useful to read?


thanks in advance for any advice!


best--

--chris sizemore

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

RE: MoreLikeThis across multiple fields question...

Posted by Chris Sizemore <Ch...@bbc.co.uk>.
hmmm, that seems a shame, esp. if a term matches in a non-most-hits-occur-here field, and then that field's boost doesn't get called in computing the score...

so, not a bug, then -- but i question the design of the class...

i'd much rather see:

conceptSearch:foo abstract:foo conceptSearch:blah abstract:blah

because matches to conceptSearch should be weighted much higher than the other field...

any further advice?



best--

--cs


-----Original Message-----
From: Daniel Naber [mailto:lucenelist2007@danielnaber.de]
Sent: Sun 10/21/2007 7:20 PM
To: java-user@lucene.apache.org
Subject: Re: MoreLikeThis across multiple fields question...
 
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:

> i'm using MoreLikeThis. i'm trying to run the document comparison across
> more than one field in my index, but i'm not at all sure that it's
> actually happening -- when i examine the constructed query, only one
> field is mentioned! here's my code:

Actually looking at MoreLikeThis.createQueue() it's supposed to use only 
that field where most hits occur for that term, so you could end up with a 
query like:

conceptSearch:foo abstract:blah

Regards
 Daniel

-- 
http://www.danielnaber.de



http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

Re: MoreLikeThis across multiple fields question...

Posted by Daniel Naber <lu...@danielnaber.de>.
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:

> i'm using MoreLikeThis. i'm trying to run the document comparison across
> more than one field in my index, but i'm not at all sure that it's
> actually happening -- when i examine the constructed query, only one
> field is mentioned! here's my code:

Actually looking at MoreLikeThis.createQueue() it's supposed to use only 
that field where most hits occur for that term, so you could end up with a 
query like:

conceptSearch:foo abstract:blah

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MoreLikeThis across multiple fields question...

Posted by Daniel Naber <lu...@danielnaber.de>.
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:

> i'm using MoreLikeThis. i'm trying to run the document comparison across
> more than one field in my index, but i'm not at all sure that it's
> actually happening -- when i examine the constructed query, only one
> field is mentioned! here's my code:

This looks like a bug indeed. Could you open a bug report to track this?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org