You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by SergeyG <sg...@mail.ru> on 2009/07/02 11:31:21 UTC

Implementing PhraseQuery and MoreLikeThis Query in one app

Hi,

Recently I've posted a question regarding using stop words in a PhraseQuery
and in a MoreLikeThis query in the same app. I posted it twice.
Unfortunately I didn't get any responses. I realize that the question might
not have been formulated clearly. So let me reformulate it.

Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
the same app taking into account the fact that for the former to work the
stop words list needs to be included and this results in the latter putting
stop words among the most important words? Or these two queries need to use
two different indexes and thus have to be implemented in different
applications or in different cores of Solr (with different schema.xml files:
one with the StopWord Filter and another without it.)?

Any opinion will be highly appreciated.

Thank you.

Redards,
Sergey Goldberg

P.S. Just for the reference, here is my original message.

1. There're 3 kinds of searches in my application: a) PhraseQuery search; b)
search for separate words; c) MLT search. The problem I encountered is in
the use of a stop words list. If I don't take it into account, the MLT query
picks up common words as the most important words what is not right. And
when I use it, the PhraseQuery stops working. I tried it with the ps and qs
parameters (ps=100, qs=100) but that didn't change anything. (Both indexed
fields are of type text, the StandardAnalyzer is applied, and all docs are
in English.)

2. Do I understand it right that the query
q=id:1&mlt=true&mlt.fl=content&...
should bring back documents where the most important words are in the set of
those for the doc with id=1?
--
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24303817.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

>Why would the inclusion of a stopword list result in stopwords being of
>top importance in the MoreLikeThis query?

Michael, 

I just saw some of them (words from the stop words list) in the MLT query's
response.

Sergey



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24304705.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi,

Rushing quickly through this one, one way you can use the same index for both is by copying fields.  One field copy would leave stopwords in (for PQ), and the other copy would remove stopwords (for MLT).  There may be more elegant ways to accomplish this - this is the first thing that comes to mind.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: SergeyG <sg...@mail.ru>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 5:31:21 AM
> Subject: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a PhraseQuery
> and in a MoreLikeThis query in the same app. I posted it twice.
> Unfortunately I didn't get any responses. I realize that the question might
> not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter putting
> stop words among the most important words? Or these two queries need to use
> two different indexes and thus have to be implemented in different
> applications or in different cores of Solr (with different schema.xml files:
> one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search; b)
> search for separate words; c) MLT search. The problem I encountered is in
> the use of a stop words list. If I don't take it into account, the MLT query
> picks up common words as the most important words what is not right. And
> when I use it, the PhraseQuery stops working. I tried it with the ps and qs
> parameters (ps=100, qs=100) but that didn't change anything. (Both indexed
> fields are of type text, the StandardAnalyzer is applied, and all docs are
> in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set of
> those for the doc with id=1?
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24303817.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

wunder, thank you. (Sorry, I'm not sure this is your first name). I thought
the MoreLikeThis query normally uses tf.idf of the terms when deciding what
terms are the most important (not the most frequent). And if this is not the
case, how can I change its behavior?



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24309831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

Thanks, Otis. I'd try that right away and tell you about the result. And if
you come up with any other idea, please let me know - just for the future.

Also thanks to Michael for the discussion.

Best regards,
Sergey



SergeyG wrote:
> 
> Hi,
> 
> Recently I've posted a question regarding using stop words in a
> PhraseQuery and in a MoreLikeThis query in the same app. I posted it
> twice. Unfortunately I didn't get any responses. I realize that the
> question might not have been formulated clearly. So let me reformulate it.
> 
> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented in
> the same app taking into account the fact that for the former to work the
> stop words list needs to be included and this results in the latter
> putting stop words among the most important words? Or these two queries
> need to use two different indexes and thus have to be implemented in
> different applications or in different cores of Solr (with different
> schema.xml files: one with the StopWord Filter and another without it.)?
> 
> Any opinion will be highly appreciated. 
> 
> Thank you.
> 
> Redards,
> Sergey Goldberg
> 
> 
> P.S. Just for the reference, here is my original message.
> 
> 1. There're 3 kinds of searches in my application: a) PhraseQuery search;
> b) search for separate words; c) MLT search. The problem I encountered is
> in the use of a stop words list. If I don't take it into account, the MLT
> query picks up common words as the most important words what is not right.
> And when I use it, the PhraseQuery stops working. I tried it with the ps
> and qs parameters (ps=100, qs=100) but that didn't change anything. (Both
> indexed fields are of type text, the StandardAnalyzer is applied, and all
> docs are in English.)
> 
> 2. Do I understand it right that the query
> q=id:1&mlt=true&mlt.fl=content&...
> should bring back documents where the most important words are in the set
> of those for the doc with id=1?
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24309525.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

Otis,

Here're the logs - method calls along with their outputs (sorry for the bulk
data :) ). I compared 3 runs.


1) GetMethod
 a) url=http://localhost:8080/solr/mlt
 b)
query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score

Output:
INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand():
url=http://localhost:8080/solr/mlt
;
queryString=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+sc
ore
 INFO MLT2SearchRequestProcessor:76 - <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><result name="ma
tch" numFound="1" start="0" maxScore="2.098612"><doc><float
name="score">2.098612</float><arr name="
author"><str>S.G.</str></arr><str
name="title">SG_Book</str></doc></result><result name="response" n
umFound="4" start="0" maxScore="0.28923997"><doc><float
name="score">0.28923997</float><arr name="au
thor"><str>O. Henry</str><str>S.G.</str></arr><str name="title">Four
Million, The</str></doc><doc><f
loat name="score">0.08667877</float><arr name="author"><str>Katherine
Mosby</str></arr><str name="ti
tle">The Season of Lillian Dawes</str></doc><doc><float
name="score">0.07947738</float><arr name="au
thor"><str>Jerome K. Jerome</str></arr><str name="title">Three Men in a
Boat</str></doc><doc><float 
name="score">0.047219563</float><arr name="author"><str>Charles
Oliver</str><str>S.G.</str></arr><st
r name="title">ABC's of Science</str></doc></result><lst
name="interestingTerms"><float name="conten
t_mlt:ye">1.0</float><float name="content_mlt:tobin">1.0</float><float
name="content_mlt:a">1.0</flo
at><float name="content_mlt:i">1.0</float><float
name="content_mlt:his">1.0</float></lst>
</response>


2) GetMethod
 a) url=http://localhost:8080/solr/select
 b)
query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score

Output:

INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand():
url=http://localhost:8080/solr/sel
ect;
queryString=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=tit
le+author+score
 INFO MLT2SearchRequestProcessor:76 - <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">15</int><lst name="params"><s
tr name="fl">title author score</str><str
name="mlt.fl">content_mlt</str><str name="q">id:10</str><s
tr name="mlt">true</str><str name="mlt.interestingTerms">details</str><str
name="mlt.maxqt">5</str><
/lst></lst><result name="response" numFound="1" start="0"
maxScore="2.098612"><doc><float name="scor
e">2.098612</float><arr name="author"><str>S.G.</str></arr><str
name="title">SG_Book</str></doc></re
sult><lst name="moreLikeThis"><result name="10" numFound="4" start="0"
maxScore="0.24578805"><doc><f
loat name="score">0.24578805</float><arr name="author"><str>O.
Henry</str><str>S.G.</str></arr><str 
name="title">Four Million, The</str></doc><doc><float
name="score">0.22171465</float><arr name="auth
or"><str>Jerome K. Jerome</str></arr><str name="title">Three Men in a
Boat</str></doc><doc><float na
me="score">0.22018899</float><arr name="author"><str>Katherine
Mosby</str></arr><str name="title">Th
e Season of Lillian Dawes</str></doc><doc><float
name="score">0.098666154</float><arr name="author">
<str>Charles Oliver</str><str>S.G.</str></arr><str name="title">ABC's of
Science</str></doc></result
></lst><lst name="debug"><str name="rawquerystring">id:10</str><str
name="querystring">id:10</str><s
tr name="parsedquery">id:10</str><str
name="parsedquery_toString">id:10</str><lst name="explain"><st
r name="10">
2.098612 = (MATCH) weight(id:10 in 3), product of:
  0.99999994 = queryWeight(id:10), product of:
    2.0986123 = idf(docFreq=1, numDocs=5)
    0.47650534 = queryNorm
  2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of:
    1.0 = tf(termFreq(id:10)=1)
    2.0986123 = idf(docFreq=1, numDocs=5)
    1.0 = fieldNorm(field=id, doc=3)
</str></lst><str name="QParser">OldLuceneQParser</str><lst
name="timing"><double name="time">15.0</d
ouble><lst name="prepare"><double name="time">0.0</double><lst
name="org.apache.solr.handler.compone
nt.QueryComponent"><double name="time">0.0</double></lst><lst
name="org.apache.solr.handler.componen
t.FacetComponent"><double name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component
.MoreLikeThisComponent"><double name="time">0.0</double></lst><lst
name="org.apache.solr.handler.com
ponent.HighlightComponent"><double name="time">0.0</double></lst><lst
name="org.apache.solr.handler.
component.DebugComponent"><double name="time">0.0</double></lst></lst><lst
name="process"><double na
me="time">15.0</double><lst
name="org.apache.solr.handler.component.QueryComponent"><double name="ti
me">0.0</double></lst><lst
name="org.apache.solr.handler.component.FacetComponent"><double name="tim
e">0.0</double></lst><lst
name="org.apache.solr.handler.component.MoreLikeThisComponent"><double nam
e="time">15.0</double></lst><lst
name="org.apache.solr.handler.component.HighlightComponent"><double
 name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.DebugComponent"><double 
name="time">0.0</double></lst></lst></lst></lst>
</response>


3) SolrJ call
 a) url=http://localhost:8080/solr
 b)
query=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+auth
    or+score

Output:

INFO MLTSearchRequestProcessor:45 - SolrServer url:
http://localhost:8080/solr
 INFO MLTSearchRequestProcessor:51 - id = 10
 INFO MLTSearchRequestProcessor:53 - constructedQuery> id:10
 INFO MLTSearchRequestProcessor:63 - solrQuery>
q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=
5&mlt.interestingTerms=details&fl=title+author+score
 INFO MLTSearchRequestProcessor:69 - Number of docs found = 1
 INFO MLTSearchRequestProcessor:73 - title = SG_Book; score = 2.098612
************

One can see that the results of 2 runs with GetMethod are almost identical:
docs found and their weights are the same. (Although the values themselves
are doubtful: for example, the response contains the original doc, though it
wasn't supposed to be in the returned list of "more like this" docs. Then
its weight shows that its id=10 was found in three other docs what shouldn't
be like that. (Or it's just that rare coincidence that 10 is among the most
important terms of this doc and other docs happen to contain it. But it
looks very unlikely. Or I simply misinterpret it?) Plus individual weights
for "intestingTerms" are the same (1.0) and that's also questionable. 
And the 3rd run (SolrJ call) returned just the original doc (with the same
weight as in the first two calls).

Maybe the problem lurks somewhere in solrconfig.xml? Now I don't have a
slightest idea where to look for a hint.

Anyway, it's a holiday today. (Hopefully my message doesn't interrupt it. :)
)

Have a great 4th of July!

Sergey


Otis Gospodnetic wrote:
> 
> 
> Sergey,
> 
> I think I confused you.  The comment about the fields listed in the "fl"
> parameter has nothing to do with the SolrJ calls not working.
> 
> For SolrJ calls not working my suggestion is to look at the logs and
> compare the GetMethod call with the SolrJ call.  Paste them if you want
> more people to look at them.
> 
> 
> Otis 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: SergeyG <sg...@mail.ru>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, July 3, 2009 4:08:37 AM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> 
>> Otis,
>> 
>> Thanks a lot. I'd certainly follow your advice and check the logs.
>> Although,
>> I must say that I've already tried all possible variations of the string
>> for
>> the "fl" parameter (spaces, commas, plus signs). More than that - the
>> query
>> still doesn't want to fetch any docs (other than the one with the id
>> specified in the query) even when the line solrQuery.setParam("fl",
>> "title
>> author score"); is commented out. So I suspect that the problem is that
>> the
>> request with the url
>> "http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..."
>> due
>> to some reason doesn't work properly. And when I use the GetMethod(url)
>> approach and send url directly in the form
>> "http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...", Solr picks up
>> the mlt component. (At least, I'll have this backup solution if the main
>> one
>> keeps committing sabotage. :) I'll just need to add a parser for an
>> incoming
>> xml-response.)
>> 
>> I'll continue my "research" of this issue and, if you're interested in
>> results, I'll definitely let you know.
>> 
>> Cheers,
>> Sergey
>> 
>> 
>> Otis Gospodnetic wrote:
>> > 
>> > 
>> > Sergey,
>> > 
>> > Glad to hear the suggestion worked!
>> > 
>> > I can't spot the problem (though I think you want to use a comma to
>> > separate the list of fields in the fl parameter value).
>> > I suggest you look at the servlet container logs and Solr logs and
>> compare
>> > requests that these two calls make.  Once you see what how the second
>> one
>> > is different from the first one, you will probably be able to figure
>> out
>> > how to adjust the second one to produce the same results as the first
>> one.
>> > 
>> >  Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> > 
>> > 
>> > 
>> > ----- Original Message ----
>> >> From: SergeyG 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Thursday, July 2, 2009 6:17:59 PM
>> >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one
>> app
>> >> 
>> >> 
>> >> Otis,
>> >> 
>> >> Your recipe does work: after copying an indexing field and excluding
>> stop
>> >> words the MoreLikeThis query started fetching meaningful results. :)
>> >> 
>> >> Just one issue remained. 
>> >> 
>> >> When I execute query in this way:
>> >> 
>> >> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
>> >> HttpClient client = new HttpClient();
>> >> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt");
>> >> get.setQueryString(query);
>> >> client.executeMethod(get);
>> >> ...
>> >> 
>> >> it works fine bringing results as an XML string. 
>> >> 
>> >> But when I use "Solr-like" approach:
>> >> 
>> >> String query = "id:1";
>> >> solrQuery.setQuery(query);
>> >> solrQuery.setParam("mlt", "true");
>> >> solrQuery.setParam("mlt.fl", "content");
>> >> solrQuery.setParam("fl", "title author score");
>> >> QueryResponse queryResponse = server.query( solrQuery );
>> >> 
>> >> the result contains only one doc with id=1 and no other "more like"
>> docs. 
>> >> 
>> >> In my solrconfig.xml, I have these settings: 
>> >> ...
>> >> 
>> >> ...
>> >> 
>> >> I guess it all is a matter of syntax but I can't figure out what's
>> wrong.
>> >> 
>> >> Thank you very much (and again, thanks to Michael and Walter).
>> >> 
>> >> Cheers,
>> >> Sergey
>> >> 
>> >> 
>> >> 
>> >> Michael Ludwig-4 wrote:
>> >> > 
>> >> > SergeyG schrieb:
>> >> > 
>> >> >> Can both queries - PhraseQuery and MoreLikeThis Query - be
>> implemented
>> >> >> in the same app taking into account the fact that for the former to
>> >> >> work the stop words list needs to be included and this results in
>> the
>> >> >> latter putting stop words among the most important words?
>> >> > 
>> >> > Why would the inclusion of a stopword list result in stopwords being
>> of
>> >> > top importance in the MoreLikeThis query?
>> >> > 
>> >> > Michael Ludwig
>> >> > 
>> >> > 
>> >> 
>> >> -- 
>> >> View this message in context: 
>> >> 
>> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> > 
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24319269.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24334508.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Sergey,

I think I confused you.  The comment about the fields listed in the "fl" parameter has nothing to do with the SolrJ calls not working.

For SolrJ calls not working my suggestion is to look at the logs and compare the GetMethod call with the SolrJ call.  Paste them if you want more people to look at them.


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: SergeyG <sg...@mail.ru>
> To: solr-user@lucene.apache.org
> Sent: Friday, July 3, 2009 4:08:37 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Otis,
> 
> Thanks a lot. I'd certainly follow your advice and check the logs. Although,
> I must say that I've already tried all possible variations of the string for
> the "fl" parameter (spaces, commas, plus signs). More than that - the query
> still doesn't want to fetch any docs (other than the one with the id
> specified in the query) even when the line solrQuery.setParam("fl", "title
> author score"); is commented out. So I suspect that the problem is that the
> request with the url
> "http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..." due
> to some reason doesn't work properly. And when I use the GetMethod(url)
> approach and send url directly in the form
> "http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...", Solr picks up
> the mlt component. (At least, I'll have this backup solution if the main one
> keeps committing sabotage. :) I'll just need to add a parser for an incoming
> xml-response.)
> 
> I'll continue my "research" of this issue and, if you're interested in
> results, I'll definitely let you know.
> 
> Cheers,
> Sergey
> 
> 
> Otis Gospodnetic wrote:
> > 
> > 
> > Sergey,
> > 
> > Glad to hear the suggestion worked!
> > 
> > I can't spot the problem (though I think you want to use a comma to
> > separate the list of fields in the fl parameter value).
> > I suggest you look at the servlet container logs and Solr logs and compare
> > requests that these two calls make.  Once you see what how the second one
> > is different from the first one, you will probably be able to figure out
> > how to adjust the second one to produce the same results as the first one.
> > 
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: SergeyG 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, July 2, 2009 6:17:59 PM
> >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> >> 
> >> 
> >> Otis,
> >> 
> >> Your recipe does work: after copying an indexing field and excluding stop
> >> words the MoreLikeThis query started fetching meaningful results. :)
> >> 
> >> Just one issue remained. 
> >> 
> >> When I execute query in this way:
> >> 
> >> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
> >> HttpClient client = new HttpClient();
> >> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt");
> >> get.setQueryString(query);
> >> client.executeMethod(get);
> >> ...
> >> 
> >> it works fine bringing results as an XML string. 
> >> 
> >> But when I use "Solr-like" approach:
> >> 
> >> String query = "id:1";
> >> solrQuery.setQuery(query);
> >> solrQuery.setParam("mlt", "true");
> >> solrQuery.setParam("mlt.fl", "content");
> >> solrQuery.setParam("fl", "title author score");
> >> QueryResponse queryResponse = server.query( solrQuery );
> >> 
> >> the result contains only one doc with id=1 and no other "more like" docs. 
> >> 
> >> In my solrconfig.xml, I have these settings: 
> >> ...
> >> 
> >> ...
> >> 
> >> I guess it all is a matter of syntax but I can't figure out what's wrong.
> >> 
> >> Thank you very much (and again, thanks to Michael and Walter).
> >> 
> >> Cheers,
> >> Sergey
> >> 
> >> 
> >> 
> >> Michael Ludwig-4 wrote:
> >> > 
> >> > SergeyG schrieb:
> >> > 
> >> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >> >> in the same app taking into account the fact that for the former to
> >> >> work the stop words list needs to be included and this results in the
> >> >> latter putting stop words among the most important words?
> >> > 
> >> > Why would the inclusion of a stopword list result in stopwords being of
> >> > top importance in the MoreLikeThis query?
> >> > 
> >> > Michael Ludwig
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24319269.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

Otis,

Thanks a lot. I'd certainly follow your advice and check the logs. Although,
I must say that I've already tried all possible variations of the string for
the "fl" parameter (spaces, commas, plus signs). More than that - the query
still doesn't want to fetch any docs (other than the one with the id
specified in the query) even when the line solrQuery.setParam("fl", "title
author score"); is commented out. So I suspect that the problem is that the
request with the url
"http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..." due
to some reason doesn't work properly. And when I use the GetMethod(url)
approach and send url directly in the form
"http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...", Solr picks up
the mlt component. (At least, I'll have this backup solution if the main one
keeps committing sabotage. :) I'll just need to add a parser for an incoming
xml-response.)

I'll continue my "research" of this issue and, if you're interested in
results, I'll definitely let you know.

Cheers,
Sergey


Otis Gospodnetic wrote:
> 
> 
> Sergey,
> 
> Glad to hear the suggestion worked!
> 
> I can't spot the problem (though I think you want to use a comma to
> separate the list of fields in the fl parameter value).
> I suggest you look at the servlet container logs and Solr logs and compare
> requests that these two calls make.  Once you see what how the second one
> is different from the first one, you will probably be able to figure out
> how to adjust the second one to produce the same results as the first one.
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: SergeyG <sg...@mail.ru>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, July 2, 2009 6:17:59 PM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> 
>> Otis,
>> 
>> Your recipe does work: after copying an indexing field and excluding stop
>> words the MoreLikeThis query started fetching meaningful results. :)
>> 
>> Just one issue remained. 
>> 
>> When I execute query in this way:
>> 
>> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
>> HttpClient client = new HttpClient();
>> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt");
>> get.setQueryString(query);
>> client.executeMethod(get);
>> ...
>> 
>> it works fine bringing results as an XML string. 
>> 
>> But when I use "Solr-like" approach:
>> 
>> String query = "id:1";
>> solrQuery.setQuery(query);
>> solrQuery.setParam("mlt", "true");
>> solrQuery.setParam("mlt.fl", "content");
>> solrQuery.setParam("fl", "title author score");
>> QueryResponse queryResponse = server.query( solrQuery );
>> 
>> the result contains only one doc with id=1 and no other "more like" docs. 
>> 
>> In my solrconfig.xml, I have these settings: 
>> ...
>> 
>> ...
>> 
>> I guess it all is a matter of syntax but I can't figure out what's wrong.
>> 
>> Thank you very much (and again, thanks to Michael and Walter).
>> 
>> Cheers,
>> Sergey
>> 
>> 
>> 
>> Michael Ludwig-4 wrote:
>> > 
>> > SergeyG schrieb:
>> > 
>> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>> >> in the same app taking into account the fact that for the former to
>> >> work the stop words list needs to be included and this results in the
>> >> latter putting stop words among the most important words?
>> > 
>> > Why would the inclusion of a stopword list result in stopwords being of
>> > top importance in the MoreLikeThis query?
>> > 
>> > Michael Ludwig
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24319269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Sergey,

Glad to hear the suggestion worked!

I can't spot the problem (though I think you want to use a comma to separate the list of fields in the fl parameter value).
I suggest you look at the servlet container logs and Solr logs and compare requests that these two calls make.  Once you see what how the second one is different from the first one, you will probably be able to figure out how to adjust the second one to produce the same results as the first one.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: SergeyG <sg...@mail.ru>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 6:17:59 PM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> 
> Otis,
> 
> Your recipe does work: after copying an indexing field and excluding stop
> words the MoreLikeThis query started fetching meaningful results. :)
> 
> Just one issue remained. 
> 
> When I execute query in this way:
> 
> String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
> HttpClient client = new HttpClient();
> GetMethod get = new GetMethod("http://localhost:8080/solr/mlt");
> get.setQueryString(query);
> client.executeMethod(get);
> ...
> 
> it works fine bringing results as an XML string. 
> 
> But when I use "Solr-like" approach:
> 
> String query = "id:1";
> solrQuery.setQuery(query);
> solrQuery.setParam("mlt", "true");
> solrQuery.setParam("mlt.fl", "content");
> solrQuery.setParam("fl", "title author score");
> QueryResponse queryResponse = server.query( solrQuery );
> 
> the result contains only one doc with id=1 and no other "more like" docs. 
> 
> In my solrconfig.xml, I have these settings: 
> ...
> 
> ...
> 
> I guess it all is a matter of syntax but I can't figure out what's wrong.
> 
> Thank you very much (and again, thanks to Michael and Walter).
> 
> Cheers,
> Sergey
> 
> 
> 
> Michael Ludwig-4 wrote:
> > 
> > SergeyG schrieb:
> > 
> >> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >> in the same app taking into account the fact that for the former to
> >> work the stop words list needs to be included and this results in the
> >> latter putting stop words among the most important words?
> > 
> > Why would the inclusion of a stopword list result in stopwords being of
> > top importance in the MoreLikeThis query?
> > 
> > Michael Ludwig
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by SergeyG <sg...@mail.ru>.

Otis,

Your recipe does work: after copying an indexing field and excluding stop
words the MoreLikeThis query started fetching meaningful results. :)

Just one issue remained. 

When I execute query in this way:

String query = "q=id:1&mlt.fl=content&...&fl=title+author+score";
HttpClient client = new HttpClient();
GetMethod get = new GetMethod("http://localhost:8080/solr/mlt");
get.setQueryString(query);
client.executeMethod(get);
...

it works fine bringing results as an XML string. 

But when I use "Solr-like" approach:

String query = "id:1";
solrQuery.setQuery(query);
solrQuery.setParam("mlt", "true");
solrQuery.setParam("mlt.fl", "content");
solrQuery.setParam("fl", "title author score");
QueryResponse queryResponse = server.query( solrQuery );

the result contains only one doc with id=1 and no other "more like" docs. 

In my solrconfig.xml, I have these settings: 
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler"> ...
<requestHandler name="standard" class="solr.SearchHandler" default="true">
...

I guess it all is a matter of syntax but I can't figure out what's wrong.

Thank you very much (and again, thanks to Michael and Walter).

Cheers,
Sergey

Michael Ludwig-4 wrote:
> 
> SergeyG schrieb:
> 
>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>> in the same app taking into account the fact that for the former to
>> work the stop words list needs to be included and this results in the
>> latter putting stop words among the most important words?
> 
> Why would the inclusion of a stopword list result in stopwords being of
> top importance in the MoreLikeThis query?
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: http://www.nabble.com/Implementing-PhraseQuery-and-MoreLikeThis-Query-in-one-app-tp24303817p24314840.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Otis Gospodnetic <ot...@yahoo.com>.

I could be wrong about MLT - maybe it really does use TF IDF and not raw frequency.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Walter Underwood <wu...@netflix.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 10:26:33 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> I think it works better to use the highest tf.idf terms, not the highest tf.
> That is what I implemented for Ultraseek ten years ago. With tf, you get
> lots of terms with low discrimination power.
> 
> wunder
> 
> On 7/2/09 4:48 AM, "Otis Gospodnetic" wrote:
> 
> > 
> > Michael - because they are the most frequent, which is how MLT selects terms
> > to use for querying, IIRC.
> > 
> > 
> > Otis --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Michael Ludwig 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, July 2, 2009 6:20:05 AM
> >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> >> 
> >> SergeyG schrieb:
> >> 
> >>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> >>> in the same app taking into account the fact that for the former to
> >>> work the stop words list needs to be included and this results in the
> >>> latter putting stop words among the most important words?
> >> 
> >> Why would the inclusion of a stopword list result in stopwords being of
> >> top importance in the MoreLikeThis query?
> >> 
> >> Michael Ludwig
> >

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Walter Underwood <wu...@netflix.com>.

I think it works better to use the highest tf.idf terms, not the highest tf.
That is what I implemented for Ultraseek ten years ago. With tf, you get
lots of terms with low discrimination power.

wunder

On 7/2/09 4:48 AM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:

> 
> Michael - because they are the most frequent, which is how MLT selects terms
> to use for querying, IIRC.
> 
> 
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Michael Ludwig <ml...@as-guides.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, July 2, 2009 6:20:05 AM
>> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
>> 
>> SergeyG schrieb:
>> 
>>> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
>>> in the same app taking into account the fact that for the former to
>>> work the stop words list needs to be included and this results in the
>>> latter putting stop words among the most important words?
>> 
>> Why would the inclusion of a stopword list result in stopwords being of
>> top importance in the MoreLikeThis query?
>> 
>> Michael Ludwig
>

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Michael - because they are the most frequent, which is how MLT selects terms to use for querying, IIRC.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Michael Ludwig <ml...@as-guides.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 2, 2009 6:20:05 AM
> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app
> 
> SergeyG schrieb:
> 
> > Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> > in the same app taking into account the fact that for the former to
> > work the stop words list needs to be included and this results in the
> > latter putting stop words among the most important words?
> 
> Why would the inclusion of a stopword list result in stopwords being of
> top importance in the MoreLikeThis query?
> 
> Michael Ludwig

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

Posted by Michael Ludwig <ml...@as-guides.com>.

SergeyG schrieb:

> Can both queries - PhraseQuery and MoreLikeThis Query - be implemented
> in the same app taking into account the fact that for the former to
> work the stop words list needs to be included and this results in the
> latter putting stop words among the most important words?

Why would the inclusion of a stopword list result in stopwords being of
top importance in the MoreLikeThis query?

Michael Ludwig