You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Nayan Gowda <na...@gmail.com> on 2010/01/27 13:02:29 UTC

Using MoreLikeThisHandler

Hi,
     I am trying to work with the MoreLikeThisHandler inorder to get the
similar documents.

Here is my configuration in Scema.xml.....

<fields>

<field name="id" type="sint" indexed="true" stored="true" required="true"
termVectors="true"/>

<field name="title" type="text" indexed="true" stored="false" termVectors="
true"/>

<field name="keywordGroup" type="string" indexed="true" stored="false"
multiValued="true" termVectors="true"/>

<field name="tagText" type="text" indexed="true" stored="true" multiValued="
true" default="" termVectors="true"/>

</fields>

n Configuration in solrconfig.xml

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">

<lst name="defaults">

<str name="mlt.fl">title,tagText,keywordGroup</str>

<str name="mlt.qf">title^1.5 tagText keywordGroup^0.5</str>

<str name="mlt.mintf">1</str>

<str name="mlt.mindf">1</str>

<str name="mlt.boost">true</str>

<str name="mlt.match.include">true</str>

</lst>
</requestHandler>


and i fire the query like this....
http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText,keywordGroup<http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText><http://localhost:8983/solr/mlt?q=id:100>

I do get some results but not accurate though..
Now i have a couple of queries.
1. Is this configuration is correct for getting the similar documents.

2. Is it poosible to support different boost for each of the "keywordGroup"?
If so please give me hint how can i achieve this?

Thanks,
Nayan K

Re: Using MoreLikeThisHandler

Posted by Nayan Gowda <na...@gmail.com>.
Hi David,
               Thanx for your reply. Now that configuration is correct.
please take a look at second requirement.
Here is the scenario.....

I am indexing  products in solr  and each product belongs to "categories"
and categories belong to CategoryGroup.

Now that product and Category and CategoryGroup are indexed in solr.

For example:
Here are the Categories and CategoryGroups


 BookType
         ScienceFiction
         Children
         Adventure

No of Pages
         100 -500
        500 - 1000
        >1000 or more


*I need to get similar books for Book1*

Book1
BookType
         ScienceFiction
         Children
          Adventure

No of Pages
         100 -500

-----------
 Book2
  -- BookType
         -- ScienceFiction

 -- Numberof pages
       -- 100 - 500

  ----------------
Book3
 BookType
         ScienceFiction
         Children

No of Pages
           100 - 500

------------------

Say i can provide the boost for Categorygroups BookType = .5 and  No of
Pages = .5
Then morelikehandler should consider category and also the boost given to
the respective CategoryGroup like below....

Book2 score ..
 1 X .5 + 1 X .5  = 1.0

Book3
 2 X .5  + 1 X .5 = 1.5


So the Book3 will be more similar to Book1 compared to book2. What i want to
achieve here is while calculatting the score both Category and CategoryGroup
are considered.

Please let me know what needs to tweaked in MoreLikeThis handler. So that
this can be achievd.

Thanks
Nayan




On Fri, Jan 29, 2010 at 1:12 PM, David Stuart <
david.stuart@progressivealliance.co.uk> wrote:

> Hi Nayan,
>
> Your configuration looks good.
> Can you expand on your second question when you say each keywordGroup do
> you mean individual values in the field?
> A good way to see what is going on under the hood is to use the analyizer
> in the admin interface
>
> Regards,
>
> Dave
>
>
> On 29 Jan 2010, at 05:28, Nayan Gowda <na...@gmail.com> wrote:
>
>   Hi,
>>    I am trying to work with the MoreLikeThisHandler inorder to get the
>> similar documents.
>> Here is my configuration in Schema.xml.....
>>
>> <fields>
>>
>> <field name="id" type="sint" indexed="true" stored="true" required="true"
>> termVectors="true"/>
>>
>> <field name="title" type="text" indexed="true" stored="false"
>> termVectors="
>> true"/>
>>
>> <field name="keywordGroup" type="string" indexed="true" stored="false"
>> multiValued="true" termVectors="true"/>
>>
>> <field name="tagText" type="text" indexed="true" stored="true"
>> multiValued="
>> true" default="" termVectors="true"/>
>>
>> </fields>
>>
>> n Configuration in solrconfig.xml
>>
>> <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
>>
>> <lst name="defaults">
>>
>> <str name="mlt.fl">title,tagText,keywordGroup</str>
>>
>> <str name="mlt.qf">title^1.5 tagText keywordGroup^0.5</str>
>>
>> <str name="mlt.mintf">1</str>
>>
>> <str name="mlt.mindf">1</str>
>>
>> <str name="mlt.boost">true</str>
>>
>> <str name="mlt.match.include">true</str>
>>
>> </lst>
>> </requestHandler>
>>
>>
>> and i fire the query like this....
>>
>> http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText,keywordGroup
>> <
>> http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText
>> ><http://localhost:8983/solr/mlt?q=id:100>
>>
>>
>> I do get some results but not accurate though..
>> Now i have a couple of queries.
>> 1. Is this configuration is correct for getting the similar documents.
>>
>> 2. Is it poosible to support different boost for each of the
>> "keywordGroup"?
>> If so please give me hint how can i achieve this?
>>
>> Thanks,
>> Nayan K
>>
>

Re: Using MoreLikeThisHandler

Posted by David Stuart <da...@progressivealliance.co.uk>.
Hi Nayan,

Your configuration looks good.
Can you expand on your second question when you say each keywordGroup  
do you mean individual values in the field?
A good way to see what is going on under the hood is to use the  
analyizer in the admin interface

Regards,

Dave

On 29 Jan 2010, at 05:28, Nayan Gowda <na...@gmail.com> wrote:

> Hi,
>     I am trying to work with the MoreLikeThisHandler inorder to get  
> the
> similar documents.
> Here is my configuration in Schema.xml.....
>
> <fields>
>
> <field name="id" type="sint" indexed="true" stored="true"  
> required="true"
> termVectors="true"/>
>
> <field name="title" type="text" indexed="true" stored="false"  
> termVectors="
> true"/>
>
> <field name="keywordGroup" type="string" indexed="true" stored="false"
> multiValued="true" termVectors="true"/>
>
> <field name="tagText" type="text" indexed="true" stored="true"  
> multiValued="
> true" default="" termVectors="true"/>
>
> </fields>
>
> n Configuration in solrconfig.xml
>
> <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
>
> <lst name="defaults">
>
> <str name="mlt.fl">title,tagText,keywordGroup</str>
>
> <str name="mlt.qf">title^1.5 tagText keywordGroup^0.5</str>
>
> <str name="mlt.mintf">1</str>
>
> <str name="mlt.mindf">1</str>
>
> <str name="mlt.boost">true</str>
>
> <str name="mlt.match.include">true</str>
>
> </lst>
> </requestHandler>
>
>
> and i fire the query like this....
> http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText,keywordGroup 
> <http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText 
> ><http://localhost:8983/solr/mlt?q=id:100>
>
> I do get some results but not accurate though..
> Now i have a couple of queries.
> 1. Is this configuration is correct for getting the similar documents.
>
> 2. Is it poosible to support different boost for each of the  
> "keywordGroup"?
> If so please give me hint how can i achieve this?
>
> Thanks,
> Nayan K

Using MoreLikeThisHandler

Posted by Nayan Gowda <na...@gmail.com>.
 Hi,
     I am trying to work with the MoreLikeThisHandler inorder to get the
similar documents.
Here is my configuration in Schema.xml.....

<fields>

<field name="id" type="sint" indexed="true" stored="true" required="true"
termVectors="true"/>

<field name="title" type="text" indexed="true" stored="false" termVectors="
true"/>

<field name="keywordGroup" type="string" indexed="true" stored="false"
multiValued="true" termVectors="true"/>

<field name="tagText" type="text" indexed="true" stored="true" multiValued="
true" default="" termVectors="true"/>

</fields>

n Configuration in solrconfig.xml

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">

<lst name="defaults">

<str name="mlt.fl">title,tagText,keywordGroup</str>

<str name="mlt.qf">title^1.5 tagText keywordGroup^0.5</str>

<str name="mlt.mintf">1</str>

<str name="mlt.mindf">1</str>

<str name="mlt.boost">true</str>

<str name="mlt.match.include">true</str>

</lst>
</requestHandler>


and i fire the query like this....
http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText,keywordGroup<http://10.99.82.12:8080/Dev/mlt/?q=id:7735&mlt.mindf=1&mlt.mintf=1&mlt.boost=true&mlt.match.include=true&mlt.fl=title,tagText><http://localhost:8983/solr/mlt?q=id:100>

I do get some results but not accurate though..
Now i have a couple of queries.
1. Is this configuration is correct for getting the similar documents.

2. Is it poosible to support different boost for each of the "keywordGroup"?
If so please give me hint how can i achieve this?

Thanks,
Nayan K