You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2007/05/24 09:32:54 UTC
Re: [Solr Wiki] Update of "MoreLikeThis" by ryan
: + If termVectors are not stored, !MoreLikeThis will generate terms from
: stored fields. If multiple fields are used for similarity, solr will
: use the default Analyzer -- NOTE: this may or ''may not'' match the
: Analyzer used to index the field. If only one field is used for
: similarity, solr will use the Analyzer defined in schema.xml
what do you mean by the "default Analyzer" .. is that StandardAnalyzer,
IndexSchema.getAnalyzer(), or IndexSchema.getQueryAnalyzer() ? ... in the
case of hte later two they will automaticly pick the correct Analyzer for
hte FieldType.
(although an interesting question is what happens if i want to find
similar docs based on a field htat is stored by not indexed so it *really*
has no analyzer)
-Hoss
Re: [Solr Wiki] Update of "MoreLikeThis" by ryan
Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : > (although an interesting question is what happens if i want to find
> : > similar docs based on a field htat is stored by not indexed so it *really*
> : > has no analyzer)
>
> : I think the MLT implementation would need some modification to support
> : that -- what you are suggesting is to get the top tf/idf terms for a
> : stored but not indexed field then query against a different field (that
> : is indexed). As is, it compares like fields to one another...
>
> ah ... i didn't know that about MLT ... i thought you could tell if to
> find "words" from a set of source fields and then query them against a
> single target field.
>
That would be something we could add to the solr MoreLikeThisHelper...
contrib MoreLikeThis can take text/reader as the input. It is just when
you use a Document as the input that you are locked into the same fields.
>
> -Hoss
>
>
Re: [Solr Wiki] Update of "MoreLikeThis" by ryan
Posted by Chris Hostetter <ho...@fucit.org>.
: > (although an interesting question is what happens if i want to find
: > similar docs based on a field htat is stored by not indexed so it *really*
: > has no analyzer)
: I think the MLT implementation would need some modification to support
: that -- what you are suggesting is to get the top tf/idf terms for a
: stored but not indexed field then query against a different field (that
: is indexed). As is, it compares like fields to one another...
ah ... i didn't know that about MLT ... i thought you could tell if to
find "words" from a set of source fields and then query them against a
single target field.
-Hoss
Re: [Solr Wiki] Update of "MoreLikeThis" by ryan
Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : + If termVectors are not stored, !MoreLikeThis will generate terms from
> : stored fields. If multiple fields are used for similarity, solr will
> : use the default Analyzer -- NOTE: this may or ''may not'' match the
> : Analyzer used to index the field. If only one field is used for
> : similarity, solr will use the Analyzer defined in schema.xml
>
> what do you mean by the "default Analyzer" .. is that StandardAnalyzer,
> IndexSchema.getAnalyzer(), or IndexSchema.getQueryAnalyzer() ? ... in the
> case of hte later two they will automaticly pick the correct Analyzer for
> hte FieldType.
>
Ahhh! I didn't realize that is how those worked. Currently I am only
setting the analyzer if there is only one field and using
fieldType.getAnalyzer() -- a better solution is to use:
searcher.getSchema().getAnalyzer()
In that case, the comment should read something like:
"If termVectors are not stored, !MoreLikeThis will generate terms from
stored fields using the Analyzer defined in schema.xml."
> (although an interesting question is what happens if i want to find
> similar docs based on a field htat is stored by not indexed so it *really*
> has no analyzer)
>
I think the MLT implementation would need some modification to support
that -- what you are suggesting is to get the top tf/idf terms for a
stored but not indexed field then query against a different field (that
is indexed). As is, it compares like fields to one another...