You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andy Pickler <an...@gmail.com> on 2013/05/22 20:20:29 UTC

MoreLikeThis - No Results

I'm a developing a recommendation feature in our app using the
MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>, and
so far it is doing a great job.  We're using a user's "competency keywords"
as the MLT field list and the user's corresponding document in Solr as the
"comparison document".  I have found that for one user I'm not receiving
any recommendations, and I'm not sure why.

Solr: 4.1.0

*relevant schema*:

<field name="competencyKeywords" type="short-mlt-text" indexed="true"
stored="true" multiValued="true" termVectors="true"/>

    <fieldType name="short-mlt-text" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

*user's values*:

<arr name="competencyKeywords">
<str>Healthcare Cost Trends</str>
</arr>

Is it possible that among all the ~40,000 users in this index (about 500 of
which have the same competency keywords), that the words "healthcare",
"cost" and "trends" are just judged by Lucene to not be "significant".  I
realize that I may not understand how the MLT Handler is doing things under
the covers...I've only been guessing until now based on the (otherwise
excellent) results I've been seeing.

Thanks,
Andy Pickler

P.S.  For some additional information, the following query:

/mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false

...produces the following results...

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="interestingTerms"/>
<lst name="debug">
<str name="rawquerystring">objectId:user91813</str>
<str name="querystring">objectId:user91813</str>
<str name="parsedquery"/>
<str name="parsedquery_toString"/>
<lst name="explain"/>
</lst>
</response>

Re: MoreLikeThis - No Results

Posted by Andy Pickler <an...@gmail.com>.
Answered my own question...

mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc

Our "source doc" is a set of limited terms...not a large content field.  So
in our case I need to set that value to 1 (rather than the default of 2).
 Now I'm getting results...and they indeed are relevant.

Thanks,
Andy Pickler

On Wed, May 22, 2013 at 12:20 PM, Andy Pickler <an...@gmail.com>wrote:

> I'm a developing a recommendation feature in our app using the
> MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>,
> and so far it is doing a great job.  We're using a user's "competency
> keywords" as the MLT field list and the user's corresponding document in
> Solr as the "comparison document".  I have found that for one user I'm not
> receiving any recommendations, and I'm not sure why.
>
> Solr: 4.1.0
>
> *relevant schema*:
>
> <field name="competencyKeywords" type="short-mlt-text" indexed="true"
> stored="true" multiValued="true" termVectors="true"/>
>
>     <fieldType name="short-mlt-text" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> *user's values*:
>
> <arr name="competencyKeywords">
> <str>Healthcare Cost Trends</str>
> </arr>
>
> Is it possible that among all the ~40,000 users in this index (about 500
> of which have the same competency keywords), that the words "healthcare",
> "cost" and "trends" are just judged by Lucene to not be "significant".  I
> realize that I may not understand how the MLT Handler is doing things under
> the covers...I've only been guessing until now based on the (otherwise
> excellent) results I've been seeing.
>
> Thanks,
> Andy Pickler
>
> P.S.  For some additional information, the following query:
>
>
> /mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false
>
> ...produces the following results...
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">2</int>
> </lst>
> <result name="response" numFound="0" start="0"/>
> <lst name="interestingTerms"/>
> <lst name="debug">
> <str name="rawquerystring">objectId:user91813</str>
> <str name="querystring">objectId:user91813</str>
> <str name="parsedquery"/>
> <str name="parsedquery_toString"/>
> <lst name="explain"/>
> </lst>
> </response>
>