You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2007/05/24 09:32:54 UTC

Re: [Solr Wiki] Update of "MoreLikeThis" by ryan

: + If termVectors are not stored, !MoreLikeThis will generate terms from
: stored fields.  If multiple fields are used for similarity, solr will
: use the default Analyzer -- NOTE: this may or ''may not'' match the
: Analyzer used to index the field.  If only one field is used for
: similarity, solr will use the Analyzer defined in schema.xml

what do you mean by the "default Analyzer" .. is that StandardAnalyzer,
IndexSchema.getAnalyzer(), or IndexSchema.getQueryAnalyzer() ? ... in the
case of hte later two they will automaticly pick the correct Analyzer for
hte FieldType.

(although an interesting question is what happens if i want to find
similar docs based on a field htat is stored by not indexed so it *really*
has no analyzer)





-Hoss

Re: [Solr Wiki] Update of "MoreLikeThis" by ryan

Posted by Ryan McKinley <ry...@gmail.com>.

Chris Hostetter wrote:
> : > (although an interesting question is what happens if i want to find
> : > similar docs based on a field htat is stored by not indexed so it *really*
> : > has no analyzer)
> 
> : I think the MLT implementation would need some modification to support
> : that -- what you are suggesting is to get the top tf/idf terms for a
> : stored but not indexed field then query against a different field (that
> : is indexed).  As is, it compares like fields to one another...
> 
> ah ... i didn't know that about MLT ... i thought you could tell if to
> find "words" from a set of source fields and then query them against a
> single target field.
> 

That would be something we could add to the solr MoreLikeThisHelper... 
contrib MoreLikeThis can take text/reader as the input.  It is just when 
you use a Document as the input that you are locked into the same fields.

> 
> -Hoss
> 
>

Re: [Solr Wiki] Update of "MoreLikeThis" by ryan

Posted by Chris Hostetter <ho...@fucit.org>.

: > (although an interesting question is what happens if i want to find
: > similar docs based on a field htat is stored by not indexed so it *really*
: > has no analyzer)

: I think the MLT implementation would need some modification to support
: that -- what you are suggesting is to get the top tf/idf terms for a
: stored but not indexed field then query against a different field (that
: is indexed).  As is, it compares like fields to one another...

ah ... i didn't know that about MLT ... i thought you could tell if to
find "words" from a set of source fields and then query them against a
single target field.


-Hoss

Re: [Solr Wiki] Update of "MoreLikeThis" by ryan

Posted by Ryan McKinley <ry...@gmail.com>.

Chris Hostetter wrote:
> : + If termVectors are not stored, !MoreLikeThis will generate terms from
> : stored fields.  If multiple fields are used for similarity, solr will
> : use the default Analyzer -- NOTE: this may or ''may not'' match the
> : Analyzer used to index the field.  If only one field is used for
> : similarity, solr will use the Analyzer defined in schema.xml
> 
> what do you mean by the "default Analyzer" .. is that StandardAnalyzer,
> IndexSchema.getAnalyzer(), or IndexSchema.getQueryAnalyzer() ? ... in the
> case of hte later two they will automaticly pick the correct Analyzer for
> hte FieldType.
> 

Ahhh!  I didn't realize that is how those worked.  Currently I am only 
setting the analyzer if there is only one field and using 
fieldType.getAnalyzer() -- a better solution is to use: 
searcher.getSchema().getAnalyzer()

In that case, the comment should read something like:

"If termVectors are not stored, !MoreLikeThis will generate terms from 
stored fields using the Analyzer defined in schema.xml."

> (although an interesting question is what happens if i want to find
> similar docs based on a field htat is stored by not indexed so it *really*
> has no analyzer)
> 

I think the MLT implementation would need some modification to support 
that -- what you are suggesting is to get the top tf/idf terms for a 
stored but not indexed field then query against a different field (that 
is indexed).  As is, it compares like fields to one another...