You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Lamb <br...@journalexperts.com> on 2011/04/07 19:30:11 UTC

MoreLikeThis match

Hi all,

I've been using MoreLikeThis for a while through select:

http://localhost:8983/solr/select/?q=field:more like
this&mlt=true&mlt.fl=field&rows=100&fl=*,score

I was looking over the wiki page today and saw that you can also do this:

http://localhost:8983/solr/mlt/?q=field:more like
this&mlt=true&mlt.fl=field&rows=100

which seems to run faster and do a better job overall. When the results are
returned, they are formatted like this:

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">1</int>
  </lst>
  <result name="match" numFound="24" start="0" maxScore="3.0438285">
    <doc>
      <float name="score">3.0438285</float>
      <str name="id">5</str>
    </doc>
  </result>
  <result name="response" numFound="4077" start="0" maxScore="0.12775186">
    <doc>
      <float name="score">0.1125823</float>
      <str name="id">3</str>
    </doc>
    <doc>
      <float name="score">0.10231556</float>
      <str name="id">8</str>
    </doc>
 ...
  </result>
</response>

It seems that it always returns just 1 response under match and response is
set by the rows parameter. How can I get more than one result under match?

What I'm trying to do here is whatever is set for field:, I would like to
return the top 100 records that match that search based on more like this.

Thanks,

Brian Lamb

Re: Trying to Post. Emails rejected as spam.

Posted by Parker Johnson <pj...@yahoo.com>.
I have tried to change to plain text format and reword my question several 
times.  Weird and annoying.  Here is my question, maybe it'll somehow go through 

this time:

In my master/slave setup, my slaves are polling the master every minute.  My 
indexes are getting large, to the point where I might take more than a minute to 


pull a fresh index over the wire.  What is the behavior of a slave if it takes 
more than 1 minute to fetch the indexes from the master?  Is the slave smart 
enough to know a previous replication request is being serviced and to not start 
another request?

-Parker



----- Original Message ----
From: Paul Rogers <pa...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 12:34:25 PM
Subject: Re: Trying to Post. Emails rejected as spam.

Hi Park

I had the same problem.  I noticed one of the issues with the blocked
messages are they are HTML/Rich Text.

(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,
HTML_MESSAGE 
<-,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL

In GMail I can switch to plain text.  This fixed the problem for me.
If you can do the same in Yahoo you should find it reduces the spam
score sufficiently to allow the messages through.

Regards

Paul

On 7 April 2011 20:21, Ezequiel Calderara <ez...@gmail.com> wrote:
>
> Happened to me a couple of times, couldn't find a way a workaround...
>
> On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson <pj...@yahoo.com> wrote:
>
> >
> > Hello everyone.  Does anyone else have problems posting to the list?  My
> > messages keep getting rejected with this response below.  I'll be surprised
> > if
> > this one makes it through :)
> >
> > -Park
> >
> > Sorry, we were unable to deliver your message to the following address.
> >
> > <so...@lucene.apache.org>:
> > Remote  host said: 552 spam score (8.0) exceeded threshold
> >
> > 
>(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
>
>
> >  ) [BODY]
> >
> > --- Below this line is a copy of the message.
> >
>
>
>
> --
> ______
> Ezequiel.
>
> Http://www.ironicnet.com


Re: Trying to Post. Emails rejected as spam.

Posted by Paul Rogers <pa...@gmail.com>.
Hi Park

I had the same problem.  I noticed one of the issues with the blocked
messages are they are HTML/Rich Text.

(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,
HTML_MESSAGE <-,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL

In GMail I can switch to plain text.  This fixed the problem for me.
If you can do the same in Yahoo you should find it reduces the spam
score sufficiently to allow the messages through.

Regards

Paul

On 7 April 2011 20:21, Ezequiel Calderara <ez...@gmail.com> wrote:
>
> Happened to me a couple of times, couldn't find a way a workaround...
>
> On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson <pj...@yahoo.com> wrote:
>
> >
> > Hello everyone.  Does anyone else have problems posting to the list?  My
> > messages keep getting rejected with this response below.  I'll be surprised
> > if
> > this one makes it through :)
> >
> > -Park
> >
> > Sorry, we were unable to deliver your message to the following address.
> >
> > <so...@lucene.apache.org>:
> > Remote  host said: 552 spam score (8.0) exceeded threshold
> >
> > (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
> >  ) [BODY]
> >
> > --- Below this line is a copy of the message.
> >
>
>
>
> --
> ______
> Ezequiel.
>
> Http://www.ironicnet.com

Re: Trying to Post. Emails rejected as spam.

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Apr 07, 2011 at 04:21:25PM -0300, Ezequiel Calderara wrote:
> Happened to me a couple of times, couldn't find a way a workaround...

Note that the property "HTML_MESSAGE" has contributed to the email's spam
score:

> > (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
> >  ) [BODY]

This issue often crops up at Apache.  Sending your messages as plain text
rather than HTML resolves it 99% of the time.

Marvin Humphrey


Re: Trying to Post. Emails rejected as spam.

Posted by Ezequiel Calderara <ez...@gmail.com>.
Happened to me a couple of times, couldn't find a way a workaround...

On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson <pj...@yahoo.com> wrote:

>
> Hello everyone.  Does anyone else have problems posting to the list?  My
> messages keep getting rejected with this response below.  I'll be surprised
> if
> this one makes it through :)
>
> -Park
>
> Sorry, we were unable to deliver your message to the following address.
>
> <so...@lucene.apache.org>:
> Remote  host said: 552 spam score (8.0) exceeded threshold
>
> (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
>  ) [BODY]
>
> --- Below this line is a copy of the message.
>



-- 
______
Ezequiel.

Http://www.ironicnet.com

Re: Trying to Post. Emails rejected as spam.

Posted by Peter Sturge <pe...@gmail.com>.
This happens almost always because you're sending from a 'free' mail
account (gmail, yahoo, hotmail, etc), and your message contains words
that spam filters don't like.
For me, it was the use of the word 'remplica' (deliberately
mis-spelled so this mail gets sent).

It can also happen from 'non-free' mail servers that have been
successfully attacked by spambots, so that filters give it a really
bad reputation score.


On Thu, Apr 7, 2011 at 8:14 PM, Parker Johnson <pj...@yahoo.com> wrote:
>
> Hello everyone.  Does anyone else have problems posting to the list?  My
> messages keep getting rejected with this response below.  I'll be surprised if
> this one makes it through :)
>
> -Park
>
> Sorry, we were unable to deliver your message to the following address.
>
> <so...@lucene.apache.org>:
> Remote  host said: 552 spam score (8.0) exceeded threshold
> (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
>  ) [BODY]
>
> --- Below this line is a copy of the message.
>

Trying to Post. Emails rejected as spam.

Posted by Parker Johnson <pj...@yahoo.com>.
Hello everyone.  Does anyone else have problems posting to the list?  My 
messages keep getting rejected with this response below.  I'll be surprised if 
this one makes it through :)

-Park

Sorry, we were unable to deliver your message to the following address.

<so...@lucene.apache.org>:
Remote  host said: 552 spam score (8.0) exceeded threshold  
(FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
  ) [BODY]

--- Below this line is a copy of the message.

Re: MoreLikeThis match

Posted by Mike Mattozzi <mi...@gmail.com>.
Match is the document that's the top result of the query (q param)
that you specify.

Response is the list of documents that are similar to the 'match' document.

-Mike

On Mon, Apr 11, 2011 at 4:55 PM, Brian Lamb
<br...@journalexperts.com> wrote:
> Does anyone have any thoughts on this one?
>
> On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb <br...@journalexperts.com>wrote:
>
>> I've looked at both wiki pages and none really clarify the difference
>> between these two. If I copy and paste an existing index value for field and
>> do an mlt search, it shows up under match but not results. What is the
>> difference between these two?
>>
>>
>> On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb <br...@journalexperts.com>wrote:
>>
>>> Actually, what is the difference between "match" and "response"? It seems
>>> that match always returns one result but I've thrown a few cases at it where
>>> the score of the highest response is higher than the score of match. And
>>> then there are cases where the match score dwarfs the highest response
>>> score.
>>>
>>>
>>> On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb <brian.lamb@journalexperts.com
>>> > wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've been using MoreLikeThis for a while through select:
>>>>
>>>> http://localhost:8983/solr/select/?q=field:more like
>>>> this&mlt=true&mlt.fl=field&rows=100&fl=*,score
>>>>
>>>> I was looking over the wiki page today and saw that you can also do this:
>>>>
>>>> http://localhost:8983/solr/mlt/?q=field:more like
>>>> this&mlt=true&mlt.fl=field&rows=100
>>>>
>>>> which seems to run faster and do a better job overall. When the results
>>>> are returned, they are formatted like this:
>>>>
>>>> <response>
>>>>   <lst name="responseHeader">
>>>>     <int name="status">0</int>
>>>>     <int name="QTime">1</int>
>>>>   </lst>
>>>>   <result name="match" numFound="24" start="0" maxScore="3.0438285">
>>>>     <doc>
>>>>       <float name="score">3.0438285</float>
>>>>       <str name="id">5</str>
>>>>     </doc>
>>>>   </result>
>>>>   <result name="response" numFound="4077" start="0"
>>>> maxScore="0.12775186">
>>>>     <doc>
>>>>       <float name="score">0.1125823</float>
>>>>       <str name="id">3</str>
>>>>     </doc>
>>>>     <doc>
>>>>       <float name="score">0.10231556</float>
>>>>       <str name="id">8</str>
>>>>     </doc>
>>>>  ...
>>>>   </result>
>>>> </response>
>>>>
>>>> It seems that it always returns just 1 response under match and response
>>>> is set by the rows parameter. How can I get more than one result under
>>>> match?
>>>>
>>>> What I'm trying to do here is whatever is set for field:, I would like to
>>>> return the top 100 records that match that search based on more like this.
>>>>
>>>> Thanks,
>>>>
>>>> Brian Lamb
>>>>
>>>
>>>
>>
>

Re: MoreLikeThis match

Posted by Brian Lamb <br...@journalexperts.com>.
Does anyone have any thoughts on this one?

On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb <br...@journalexperts.com>wrote:

> I've looked at both wiki pages and none really clarify the difference
> between these two. If I copy and paste an existing index value for field and
> do an mlt search, it shows up under match but not results. What is the
> difference between these two?
>
>
> On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb <br...@journalexperts.com>wrote:
>
>> Actually, what is the difference between "match" and "response"? It seems
>> that match always returns one result but I've thrown a few cases at it where
>> the score of the highest response is higher than the score of match. And
>> then there are cases where the match score dwarfs the highest response
>> score.
>>
>>
>> On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb <brian.lamb@journalexperts.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> I've been using MoreLikeThis for a while through select:
>>>
>>> http://localhost:8983/solr/select/?q=field:more like
>>> this&mlt=true&mlt.fl=field&rows=100&fl=*,score
>>>
>>> I was looking over the wiki page today and saw that you can also do this:
>>>
>>> http://localhost:8983/solr/mlt/?q=field:more like
>>> this&mlt=true&mlt.fl=field&rows=100
>>>
>>> which seems to run faster and do a better job overall. When the results
>>> are returned, they are formatted like this:
>>>
>>> <response>
>>>   <lst name="responseHeader">
>>>     <int name="status">0</int>
>>>     <int name="QTime">1</int>
>>>   </lst>
>>>   <result name="match" numFound="24" start="0" maxScore="3.0438285">
>>>     <doc>
>>>       <float name="score">3.0438285</float>
>>>       <str name="id">5</str>
>>>     </doc>
>>>   </result>
>>>   <result name="response" numFound="4077" start="0"
>>> maxScore="0.12775186">
>>>     <doc>
>>>       <float name="score">0.1125823</float>
>>>       <str name="id">3</str>
>>>     </doc>
>>>     <doc>
>>>       <float name="score">0.10231556</float>
>>>       <str name="id">8</str>
>>>     </doc>
>>>  ...
>>>   </result>
>>> </response>
>>>
>>> It seems that it always returns just 1 response under match and response
>>> is set by the rows parameter. How can I get more than one result under
>>> match?
>>>
>>> What I'm trying to do here is whatever is set for field:, I would like to
>>> return the top 100 records that match that search based on more like this.
>>>
>>> Thanks,
>>>
>>> Brian Lamb
>>>
>>
>>
>

Re: MoreLikeThis match

Posted by Brian Lamb <br...@journalexperts.com>.
I've looked at both wiki pages and none really clarify the difference
between these two. If I copy and paste an existing index value for field and
do an mlt search, it shows up under match but not results. What is the
difference between these two?

On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb <br...@journalexperts.com>wrote:

> Actually, what is the difference between "match" and "response"? It seems
> that match always returns one result but I've thrown a few cases at it where
> the score of the highest response is higher than the score of match. And
> then there are cases where the match score dwarfs the highest response
> score.
>
>
> On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb <br...@journalexperts.com>wrote:
>
>> Hi all,
>>
>> I've been using MoreLikeThis for a while through select:
>>
>> http://localhost:8983/solr/select/?q=field:more like
>> this&mlt=true&mlt.fl=field&rows=100&fl=*,score
>>
>> I was looking over the wiki page today and saw that you can also do this:
>>
>> http://localhost:8983/solr/mlt/?q=field:more like
>> this&mlt=true&mlt.fl=field&rows=100
>>
>> which seems to run faster and do a better job overall. When the results
>> are returned, they are formatted like this:
>>
>> <response>
>>   <lst name="responseHeader">
>>     <int name="status">0</int>
>>     <int name="QTime">1</int>
>>   </lst>
>>   <result name="match" numFound="24" start="0" maxScore="3.0438285">
>>     <doc>
>>       <float name="score">3.0438285</float>
>>       <str name="id">5</str>
>>     </doc>
>>   </result>
>>   <result name="response" numFound="4077" start="0" maxScore="0.12775186">
>>     <doc>
>>       <float name="score">0.1125823</float>
>>       <str name="id">3</str>
>>     </doc>
>>     <doc>
>>       <float name="score">0.10231556</float>
>>       <str name="id">8</str>
>>     </doc>
>>  ...
>>   </result>
>> </response>
>>
>> It seems that it always returns just 1 response under match and response
>> is set by the rows parameter. How can I get more than one result under
>> match?
>>
>> What I'm trying to do here is whatever is set for field:, I would like to
>> return the top 100 records that match that search based on more like this.
>>
>> Thanks,
>>
>> Brian Lamb
>>
>
>

Re: MoreLikeThis match

Posted by Brian Lamb <br...@journalexperts.com>.
Actually, what is the difference between "match" and "response"? It seems
that match always returns one result but I've thrown a few cases at it where
the score of the highest response is higher than the score of match. And
then there are cases where the match score dwarfs the highest response
score.

On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb <br...@journalexperts.com>wrote:

> Hi all,
>
> I've been using MoreLikeThis for a while through select:
>
> http://localhost:8983/solr/select/?q=field:more like
> this&mlt=true&mlt.fl=field&rows=100&fl=*,score
>
> I was looking over the wiki page today and saw that you can also do this:
>
> http://localhost:8983/solr/mlt/?q=field:more like
> this&mlt=true&mlt.fl=field&rows=100
>
> which seems to run faster and do a better job overall. When the results are
> returned, they are formatted like this:
>
> <response>
>   <lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">1</int>
>   </lst>
>   <result name="match" numFound="24" start="0" maxScore="3.0438285">
>     <doc>
>       <float name="score">3.0438285</float>
>       <str name="id">5</str>
>     </doc>
>   </result>
>   <result name="response" numFound="4077" start="0" maxScore="0.12775186">
>     <doc>
>       <float name="score">0.1125823</float>
>       <str name="id">3</str>
>     </doc>
>     <doc>
>       <float name="score">0.10231556</float>
>       <str name="id">8</str>
>     </doc>
>  ...
>   </result>
> </response>
>
> It seems that it always returns just 1 response under match and response is
> set by the rows parameter. How can I get more than one result under match?
>
> What I'm trying to do here is whatever is set for field:, I would like to
> return the top 100 records that match that search based on more like this.
>
> Thanks,
>
> Brian Lamb
>