You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Parks <da...@yahoo.com> on 2012/12/25 11:04:35 UTC

MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mlt&q=id:[document
id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10

But can I send it 2+ document IDs as the query?


RE: MoreLikeThis supporting multiple document IDs as input?

Posted by David Parks <da...@yahoo.com>.
Someone else suggested this query: q=id:[10000001 OR 10000002], where the numbers represent multiple IDs, but if I get it, you're saying that these ultimate get turned into just one document and we get similar documents to just that one. 

MoreLikeThese sounds promising. Is this in one of the development builds, or is it just and addon I need to install? I haven't done much customization of Solr yet.

Thanks!
Dave


-----Original Message-----
From: Roman Chyla [mailto:roman.chyla@gmail.com] 
Sent: Wednesday, December 26, 2012 3:57 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Jay Luker has written MoreLikeThese which is probably what you want. You may give it a try, though I am not sure if it works with Solr4.0 at this point (we didn't port it yet)

https://github.com/romanchyla/montysolr/blob/MLT/contrib/adsabs/src/java/org/apache/solr/handler/MoreLikeTheseHandler.java

roman

On Wed, Dec 26, 2012 at 12:06 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document 
> that the query matches.
>
> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then 
> find documents similar to that merged super-document. Is that what you 
> were really seeking, as opposed to what the MLT component does? 
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents 
> and then you could POST that text back to the MLT handler and find 
> similar documents using the posted text rather than a query. Kind of 
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message----- From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
>
> I'm unclear on this point from the documentation. Is it possible to 
> give Solr X # of document IDs and tell it that I want documents 
> similar to those X documents?
>
> Example:
>
>  - The user is browsing 5 different articles
>  - I send Solr the IDs of these 5 articles so I can present the user 
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/**select/?qt=mlt&q=id:[document<http://loca
> lhost:8080/solr/select/?qt=mlt&q=id:[document>
> id]&mlt.fl=[field1],[field2],[**field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>


Re: MoreLikeThis supporting multiple document IDs as input?

Posted by Roman Chyla <ro...@gmail.com>.
Jay Luker has written MoreLikeThese which is probably what you want. You
may give it a try, though I am not sure if it works with Solr4.0 at this
point (we didn't port it yet)

https://github.com/romanchyla/montysolr/blob/MLT/contrib/adsabs/src/java/org/apache/solr/handler/MoreLikeTheseHandler.java

roman

On Wed, Dec 26, 2012 at 12:06 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document that
> the query matches.
>
> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by each
> of the base documents.
>
> It sounds like you wanted to merge the base search results and then find
> documents similar to that merged super-document. Is that what you were
> really seeking, as opposed to what the MLT component does? Unfortunately,
> you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents and
> then you could POST that text back to the MLT handler and find similar
> documents using the posted text rather than a query. Kind of messy, but in
> theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message----- From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
>
> I'm unclear on this point from the documentation. Is it possible to give
> Solr X # of document IDs and tell it that I want documents similar to those
> X documents?
>
> Example:
>
>  - The user is browsing 5 different articles
>  - I send Solr the IDs of these 5 articles so I can present the user other
> similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/**select/?qt=mlt&q=id:[document<http://localhost:8080/solr/select/?qt=mlt&q=id:[document>
> id]&mlt.fl=[field1],[field2],[**field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>

RE: MoreLikeThis supporting multiple document IDs as input?

Posted by David Parks <da...@yahoo.com>.
Aha! &mlt=true, that was the key I hadn't worked out before (thought it was
&qt=mlt that achieved that), things are looking rosy now, and these results
are a perfect fit for my needs. Thanks very much for your time to help
explain this!!

David


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Thursday, January 03, 2013 8:46 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

The MLT search component is enabled using &mlt=true and works on any normal
Solr query. It gives a batch of similar documents for each search result of
the original query, one batch per original query result. It uses the
&mlt.count=n parameter to control how many similar results to return for
each original query result.

The MLT request handler is a standalone request handler that does a query,
takes the first result, and then returns one batch of documents that are
similar to that one document. You have to configure the handler yourself,
but typically it would have the name "/mlt", so you would write:

http://10.0.0.1:8080/solr/mlt/?q=shoes&rows=3

It will show you both the single document from the original query and then
the batch of documents that are most similar to the top terms from that one
original document.

Add &debugQuery=true or &debug=query or &debug=results to see the terms that
are used in the secondary queries that find the similar documents.

There are a bunch a parameters that you have to tune for either approach.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.

If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:

http://10.0.0.1:8080/solr/select/?qt=mlt&terms=true&q=shoes&rows=3

The query above for a simple term such as "shoes" can return many documents.
But I limited the results to 3, and I see 3 results, and the results don't
appear to me any different than doing this query:

http://107.23.102.164:8080/solr/select/?q=shoes&rows=3

So that suggests to me that solr maybe isn't handing things off to the MLT
component as expected (I don't know what results to expect so it's hard for
me to know where I'm trying to get to).

So add in a debugQuery=on parameter and I see this, possibly useful
reference:

<str name="QParser">LuceneQParser</str>

It also appears that the MoreLikeThisComponent did indeed run

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">

So maybe I should ask exactly what results I should be expecting here?

Thanks very much!
David


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Friday, December 28, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n
results

MLT request handler: only FIRST result is examined, so only k similar
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or
simply that you don't LIKE the difference?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document that
the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, 
> but I'm no master). What I hear you say is that MLT *can* accept, say 
> 5, documents and provide results, but the results would essentially be 
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them 
> together at the end (perhaps I'd take the top 2 of each result, for 
> example).
>
> Being somewhat new I'm a little confused by the difference between a 
> "Search Component" and a "Handler". I've got the /mlt handler working 
> and I'm using that. But how's that different from a "Search 
> Component"? Is that referring to the default /solr/select?q="..."
> style query?
>
> And if what I said about multiple documents above is correct, what's 
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document 
> that the query matches.
>
> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then 
> find documents similar to that merged super-document. Is that what you 
> were really seeking, as opposed to what the MLT component does?
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents 
> and then you could POST that text back to the MLT handler and find 
> similar documents using the posted text rather than a query. Kind of 
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to 
> give Solr X # of document IDs and tell it that I want documents 
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user 
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
> 


Re: MoreLikeThis supporting multiple document IDs as input?

Posted by Jack Krupansky <ja...@basetechnology.com>.
The MLT search component is enabled using &mlt=true and works on any normal 
Solr query. It gives a batch of similar documents for each search result of 
the original query, one batch per original query result. It uses the 
&mlt.count=n parameter to control how many similar results to return for 
each original query result.

The MLT request handler is a standalone request handler that does a query, 
takes the first result, and then returns one batch of documents that are 
similar to that one document. You have to configure the handler yourself, 
but typically it would have the name "/mlt", so you would write:

http://10.0.0.1:8080/solr/mlt/?q=shoes&rows=3

It will show you both the single document from the original query and then 
the batch of documents that are most similar to the top terms from that one 
original document.

Add &debugQuery=true or &debug=query or &debug=results to see the terms that 
are used in the secondary queries that find the similar documents.

There are a bunch a parameters that you have to tune for either approach.

-- Jack Krupansky

-----Original Message----- 
From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.

If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:

http://10.0.0.1:8080/solr/select/?qt=mlt&terms=true&q=shoes&rows=3

The query above for a simple term such as "shoes" can return many documents.
But I limited the results to 3, and I see 3 results, and the results don't
appear to me any different than doing this query:

http://107.23.102.164:8080/solr/select/?q=shoes&rows=3

So that suggests to me that solr maybe isn't handing things off to the MLT
component as expected (I don't know what results to expect so it's hard for
me to know where I'm trying to get to).

So add in a debugQuery=on parameter and I see this, possibly useful
reference:

<str name="QParser">LuceneQParser</str>

It also appears that the MoreLikeThisComponent did indeed run

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">

So maybe I should ask exactly what results I should be expecting here?

Thanks very much!
David


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Friday, December 28, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n
results

MLT request handler: only FIRST result is examined, so only k similar
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or
simply that you don't LIKE the difference?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document that
the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books,
> but I'm no master). What I hear you say is that MLT *can* accept, say
> 5, documents and provide results, but the results would essentially be
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them
> together at the end (perhaps I'd take the top 2 of each result, for
> example).
>
> Being somewhat new I'm a little confused by the difference between a
> "Search Component" and a "Handler". I've got the /mlt handler working
> and I'm using that. But how's that different from a "Search
> Component"? Is that referring to the default /solr/select?q="..."
> style query?
>
> And if what I said about multiple documents above is correct, what's
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document
> that the query matches.
>
> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then
> find documents similar to that merged super-document. Is that what you
> were really seeking, as opposed to what the MLT component does?
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents
> and then you could POST that text back to the MLT handler and find
> similar documents using the posted text rather than a query. Kind of
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to
> give Solr X # of document IDs and tell it that I want documents
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
> 

RE: MoreLikeThis supporting multiple document IDs as input?

Posted by David Parks <da...@yahoo.com>.
I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.

If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:

http://10.0.0.1:8080/solr/select/?qt=mlt&terms=true&q=shoes&rows=3

The query above for a simple term such as "shoes" can return many documents.
But I limited the results to 3, and I see 3 results, and the results don't
appear to me any different than doing this query:

http://107.23.102.164:8080/solr/select/?q=shoes&rows=3

So that suggests to me that solr maybe isn't handing things off to the MLT
component as expected (I don't know what results to expect so it's hard for
me to know where I'm trying to get to).

So add in a debugQuery=on parameter and I see this, possibly useful
reference:

	<str name="QParser">LuceneQParser</str>

It also appears that the MoreLikeThisComponent did indeed run

	<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">

So maybe I should ask exactly what results I should be expecting here? 

Thanks very much!
David


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Friday, December 28, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n
results

MLT request handler: only FIRST result is examined, so only k similar
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or
simply that you don't LIKE the difference?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document that
the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, 
> but I'm no master). What I hear you say is that MLT *can* accept, say 
> 5, documents and provide results, but the results would essentially be 
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them 
> together at the end (perhaps I'd take the top 2 of each result, for 
> example).
>
> Being somewhat new I'm a little confused by the difference between a 
> "Search Component" and a "Handler". I've got the /mlt handler working 
> and I'm using that. But how's that different from a "Search 
> Component"? Is that referring to the default /solr/select?q="..."
> style query?
>
> And if what I said about multiple documents above is correct, what's 
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document 
> that the query matches.
>
> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then 
> find documents similar to that merged super-document. Is that what you 
> were really seeking, as opposed to what the MLT component does?
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents 
> and then you could POST that text back to the MLT handler and find 
> similar documents using the posted text rather than a query. Kind of 
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to 
> give Solr X # of document IDs and tell it that I want documents 
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user 
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
> 


Re: MoreLikeThis supporting multiple document IDs as input?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n 
results

MLT request handler: only FIRST result is examined, so only k similar 
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or 
simply that you don't LIKE the difference?! Or, maybe that you are wondering 
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message----- 
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document
that the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books,
> but I'm no master). What I hear you say is that MLT *can* accept, say
> 5, documents and provide results, but the results would essentially be
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them
> together at the end (perhaps I'd take the top 2 of each result, for
> example).
>
> Being somewhat new I'm a little confused by the difference between a
> "Search Component" and a "Handler". I've got the /mlt handler working
> and I'm using that. But how's that different from a "Search
> Component"? Is that referring to the default /solr/select?q="..."
> style query?
>
> And if what I said about multiple documents above is correct, what's
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document
> that the query matches.
>
> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then
> find documents similar to that merged super-document. Is that what you
> were really seeking, as opposed to what the MLT component does?
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents
> and then you could POST that text back to the MLT handler and find
> similar documents using the posted text rather than a query. Kind of
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to
> give Solr X # of document IDs and tell it that I want documents
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
> 


RE: MoreLikeThis supporting multiple document IDs as input?

Posted by David Parks <da...@yahoo.com>.
So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

 - The MLT search component returns similar documents for each of the
documents in the search results
 - The MLT handler returns similar documents only for the first document
that the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, 
> but I'm no master). What I hear you say is that MLT *can* accept, say 
> 5, documents and provide results, but the results would essentially be 
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them 
> together at the end (perhaps I'd take the top 2 of each result, for 
> example).
>
> Being somewhat new I'm a little confused by the difference between a 
> "Search Component" and a "Handler". I've got the /mlt handler working 
> and I'm using that. But how's that different from a "Search 
> Component"? Is that referring to the default /solr/select?q="..." 
> style query?
>
> And if what I said about multiple documents above is correct, what's 
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document 
> that the query matches.
>
> The MLT search component returns similar documents for each of the 
> documents in the search results, but processes each search result base 
> document one at a time and keeps its similar documents segregated by 
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then 
> find documents similar to that merged super-document. Is that what you 
> were really seeking, as opposed to what the MLT component does? 
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents 
> and then you could POST that text back to the MLT handler and find 
> similar documents using the posted text rather than a query. Kind of 
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to 
> give Solr X # of document IDs and tell it that I want documents 
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user 
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
>


RE: MoreLikeThis supporting multiple document IDs as input?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how
they are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <da...@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books, but
> I'm
> no master). What I hear you say is that MLT *can* accept, say 5, documents
> and provide results, but the results would essentially be the same as
> running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them
> together at the end (perhaps I'd take the top 2 of each result, for
> example).
>
> Being somewhat new I'm a little confused by the difference between a
> "Search
> Component" and a "Handler". I've got the /mlt handler working and I'm using
> that. But how's that different from a "Search Component"? Is that referring
> to the default /solr/select?q="..." style query?
>
> And if what I said about multiple documents above is correct, what's the
> syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document that
> the query matches.
>
> The MLT search component returns similar documents for each of the
> documents
> in the search results, but processes each search result base document one
> at
> a time and keeps its similar documents segregated by each of the base
> documents.
>
> It sounds like you wanted to merge the base search results and then find
> documents similar to that merged super-document. Is that what you were
> really seeking, as opposed to what the MLT component does? Unfortunately,
> you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents and
> then
> you could POST that text back to the MLT handler and find similar documents
> using the posted text rather than a query. Kind of messy, but in theory
> that
> should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to give
> Solr X # of document IDs and tell it that I want documents similar to those
> X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user other
> similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
>

RE: MoreLikeThis supporting multiple document IDs as input?

Posted by David Parks <da...@yahoo.com>.
I'm somewhat new to Solr (it's running, I've been through the books, but I'm
no master). What I hear you say is that MLT *can* accept, say 5, documents
and provide results, but the results would essentially be the same as
running the query 5 times for each document?

If that's the case, I might accept it. I would just have to merge them
together at the end (perhaps I'd take the top 2 of each result, for
example).

Being somewhat new I'm a little confused by the difference between a "Search
Component" and a "Handler". I've got the /mlt handler working and I'm using
that. But how's that different from a "Search Component"? Is that referring
to the default /solr/select?q="..." style query?

And if what I said about multiple documents above is correct, what's the
syntax to try that out?

Thanks very much for the great help!
Dave


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Wednesday, December 26, 2012 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document that
the query matches.

The MLT search component returns similar documents for each of the documents
in the search results, but processes each search result base document one at
a time and keeps its similar documents segregated by each of the base
documents.

It sounds like you wanted to merge the base search results and then find
documents similar to that merged super-document. Is that what you were
really seeking, as opposed to what the MLT component does? Unfortunately,
you can't do that with the components as they are.

You would have to manually merge the values from the base documents and then
you could POST that text back to the MLT handler and find similar documents
using the posted text rather than a query. Kind of messy, but in theory that
should work.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mlt&q=id:[document
id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10

But can I send it 2+ document IDs as the query? 


Re: MoreLikeThis supporting multiple document IDs as input?

Posted by Jack Krupansky <ja...@basetechnology.com>.
MLT has both a request handler and a search component.

The MLT handler returns similar documents only for the first document that 
the query matches.

The MLT search component returns similar documents for each of the documents 
in the search results, but processes each search result base document one at 
a time and keeps its similar documents segregated by each of the base 
documents.

It sounds like you wanted to merge the base search results and then find 
documents similar to that merged super-document. Is that what you were 
really seeking, as opposed to what the MLT component does? Unfortunately, 
you can't do that with the components as they are.

You would have to manually merge the values from the base documents and then 
you could POST that text back to the MLT handler and find similar documents 
using the posted text rather than a query. Kind of messy, but in theory that 
should work.

-- Jack Krupansky

-----Original Message----- 
From: David Parks
Sent: Tuesday, December 25, 2012 5:04 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis supporting multiple document IDs as input?

I'm unclear on this point from the documentation. Is it possible to give
Solr X # of document IDs and tell it that I want documents similar to those
X documents?

Example:

  - The user is browsing 5 different articles
  - I send Solr the IDs of these 5 articles so I can present the user other
similar articles

I see this example for sending it 1 document ID:
http://localhost:8080/solr/select/?qt=mlt&q=id:[document
id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10

But can I send it 2+ document IDs as the query?