You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chantal Ackermann <ch...@btelligent.de> on 2009/11/09 16:25:44 UTC

Similar documents from multiple cores with different schemas

Hi all,

my search for any postings answering the following question haven't 
produced any helpful hints so far. Maybe someone can point me into the 
right direction?

Situation:
I have two cores with slightly different schemas. Slightly means that 
some fields appear on both cores but there are some that are required in 
one core but optional in the other. Then there are fields that appear 
only in one core.
(I don't want to put them in one index, right now, because of the fields 
that might be required for only one type but not the other. But it's 
certainly an option.)

Question:
Is there a way to get similar contents from core B when the input (seed) 
to the comparison is a document from core A?

MoreLikeThis:
I was searching for MoreLikeThis, multiple schemas etc. As these are 
cores with different schemas, the posts on distributed search/sharding 
in combination with MoreLikeThis are not helpful. But maybe there is 
some other functionality that I am not aware of? Some similarity search? 
Or maybe it's possible to tweak MoreLikeThis just to return the fields 
and terms that could be used for a search on the other core?

Thanks for any input!
Chantal

Re: Similar documents from multiple cores with different schemas

Posted by Chantal Ackermann <ch...@btelligent.de>.
Thanks Alexey, this is working.

I've split it into query and boostQuery using dismax and it gives some 
appropriate results.

Cheers,
Chantal

Alexey Serba schrieb:
>> Or maybe it's
>> possible to tweak MoreLikeThis just to return the fields and terms that
>> could be used for a search on the other core?
> Exactly
> 
> See parameter mlt.interestingTerms in MoreLikeThisHandler
> http://wiki.apache.org/solr/MoreLikeThisHandler
> 
> You can get interesting terms and build query (with N optional clauses
> + boosts) to second core yourself
> 
> HIH,
> Alex
> 
> 
> On Mon, Nov 9, 2009 at 6:25 PM, Chantal Ackermann
> <ch...@btelligent.de> wrote:
>> Hi all,
>>
>> my search for any postings answering the following question haven't produced
>> any helpful hints so far. Maybe someone can point me into the right
>> direction?
>>
>> Situation:
>> I have two cores with slightly different schemas. Slightly means that some
>> fields appear on both cores but there are some that are required in one core
>> but optional in the other. Then there are fields that appear only in one
>> core.
>> (I don't want to put them in one index, right now, because of the fields
>> that might be required for only one type but not the other. But it's
>> certainly an option.)
>>
>> Question:
>> Is there a way to get similar contents from core B when the input (seed) to
>> the comparison is a document from core A?
>>
>> MoreLikeThis:
>> I was searching for MoreLikeThis, multiple schemas etc. As these are cores
>> with different schemas, the posts on distributed search/sharding in
>> combination with MoreLikeThis are not helpful. But maybe there is some other
>> functionality that I am not aware of? Some similarity search? Or maybe it's
>> possible to tweak MoreLikeThis just to return the fields and terms that
>> could be used for a search on the other core?
>>
>> Thanks for any input!
>> Chantal
>>

Re: Similar documents from multiple cores with different schemas

Posted by Alexey Serba <as...@gmail.com>.
> Or maybe it's
> possible to tweak MoreLikeThis just to return the fields and terms that
> could be used for a search on the other core?
Exactly

See parameter mlt.interestingTerms in MoreLikeThisHandler
http://wiki.apache.org/solr/MoreLikeThisHandler

You can get interesting terms and build query (with N optional clauses
+ boosts) to second core yourself

HIH,
Alex


On Mon, Nov 9, 2009 at 6:25 PM, Chantal Ackermann
<ch...@btelligent.de> wrote:
> Hi all,
>
> my search for any postings answering the following question haven't produced
> any helpful hints so far. Maybe someone can point me into the right
> direction?
>
> Situation:
> I have two cores with slightly different schemas. Slightly means that some
> fields appear on both cores but there are some that are required in one core
> but optional in the other. Then there are fields that appear only in one
> core.
> (I don't want to put them in one index, right now, because of the fields
> that might be required for only one type but not the other. But it's
> certainly an option.)
>
> Question:
> Is there a way to get similar contents from core B when the input (seed) to
> the comparison is a document from core A?
>
> MoreLikeThis:
> I was searching for MoreLikeThis, multiple schemas etc. As these are cores
> with different schemas, the posts on distributed search/sharding in
> combination with MoreLikeThis are not helpful. But maybe there is some other
> functionality that I am not aware of? Some similarity search? Or maybe it's
> possible to tweak MoreLikeThis just to return the fields and terms that
> could be used for a search on the other core?
>
> Thanks for any input!
> Chantal
>

Re: Similar documents from multiple cores with different schemas

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Chantal,

What you described in the last sentence should work.  You can search by example by using the whole or some portion of doc from core A as the query against core B.  That is, more or less, what MLT does under the hood anyway.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Chantal Ackermann <ch...@btelligent.de>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Mon, November 9, 2009 10:25:44 AM
> Subject: Similar documents from multiple cores with different schemas
> 
> Hi all,
> 
> my search for any postings answering the following question haven't produced any 
> helpful hints so far. Maybe someone can point me into the right direction?
> 
> Situation:
> I have two cores with slightly different schemas. Slightly means that some 
> fields appear on both cores but there are some that are required in one core but 
> optional in the other. Then there are fields that appear only in one core.
> (I don't want to put them in one index, right now, because of the fields that 
> might be required for only one type but not the other. But it's certainly an 
> option.)
> 
> Question:
> Is there a way to get similar contents from core B when the input (seed) to the 
> comparison is a document from core A?
> 
> MoreLikeThis:
> I was searching for MoreLikeThis, multiple schemas etc. As these are cores with 
> different schemas, the posts on distributed search/sharding in combination with 
> MoreLikeThis are not helpful. But maybe there is some other functionality that I 
> am not aware of? Some similarity search? Or maybe it's possible to tweak 
> MoreLikeThis just to return the fields and terms that could be used for a search 
> on the other core?
> 
> Thanks for any input!
> Chantal