You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Mitch (JIRA)" <ji...@apache.org> on 2010/07/21 12:11:53 UTC

[jira] Created: (INFRA-2889) Solr Mailing List Spam detection

Solr Mailing List Spam detection
--------------------------------

                 Key: INFRA-2889
                 URL: https://issues.apache.org/jira/browse/INFRA-2889
             Project: Infrastructure
          Issue Type: Bug
      Security Level: public (Regular issues)
            Reporter: Mitch


Hello,

I got a problem, when the Solr Mailing lists declares one of my postings as SPAM.

Here is the mail I get back from the server:

<quote>
This message was created automatically by mail delivery software.

A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:

  solr-user@lucene.apache.org
    SMTP error from remote mail server after end of data:
    host mx1.eu.apache.org [192.87.106.230]: 552 spam score (7.8) exceeded threshold

------ This is a copy of the message, including all the headers. ------

Return-path: <mi...@web.de>
Received: from ben.nabble.com ([192.168.236.152])
	by kuber.nabble.com with esmtp (Exim 4.63)
	(envelope-from <mi...@web.de>)
	id 1Oak1S-0007kp-SK
	for solr-user@lucene.apache.org; Sun, 18 Jul 2010 23:40:18 -0700
Date: Sun, 18 Jul 2010 23:40:18 -0700 (PDT)
From: MitchK <mi...@web.de>
To: solr-user@lucene.apache.org
Message-ID: <12...@n3.nabble.com>
Subject: Solr in an extra project, what about replication, scaling, etc.?
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-SA-Exim-Connect-IP: 192.168.236.152
X-SA-Exim-Mail-From: mitch91@web.de
X-SA-Exim-Scanned: No (on kuber.nabble.com); SAEximRunCond expanded to false


First of all: This post was still pending since a few days at Nabble, please
bear with me, if you see it for the second or third time.
------
Hello, 

during talks with students at my future university, I got some ideas for a
small student-project (hope this is the correct word). 

The target of the project is to enrich the search with information, which is
not accessible by a Lucene index. 

Unfortunately, this would also be my first "own" JavaEE-project-experience. 
So I got some questions regarding to Solr and a project, where Solr is a
part of. 

The architecture: 
I got my enriching-engine that does some magic over a search-query. And
additionally to that I got Solr itself. 
I want to reduce HTTP-overhead by using embedded Solr in the
enriching-application. 
If the enriching-engine has done its work, it directly queries Solr for
search-results. 

What would be, if the number of queries becomes so large, that one
Solr-instance can not handle that? 
Does embedded Solr support distributed search? 

Do I really need embedded Solr in this case? 
Against reducing HTTP-overhead speaks that I think in pictures of a
distributed environment. So, if I would use Embedded Solr and *if* Embedded
Solr supports distributed search, each node has to communicate over
http-requests with eachother - this makes my "optimization" supersede,
right? Or is this still an usefull optimization? What are your experiences? 

Another significant reason to use Embedded Solr is:
When I want to add a few lines to the Solr response by another application,
I need to parse Solr's response in the application. Since I do not have
experiences with such a usecase on the large scale, I do not know, whether
this would make a significant difference on the performance-side.
If I use Embedded Solr, I hope to avoid parsing a Solr's response, when I
want to modify it - because I can add whatever I want directly after I do
something like (System.out.println(Solr's response). I know that it is not
*that* easy, but it shows the idea. 

Thank you for help! 
- Mitch
-- View this message in context: http://lucene.472066.n3.nabble.com/Solr-in-an-extra-project-what-about-replication-scaling-etc-tp977812p977812.html Sent from the Solr - User mailing list archive at Nabble.com. 
</quote>

Suggestions why this happens?
Thank you.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (INFRA-2889) Solr Mailing List Spam detection

Posted by "Tony Stevenson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/INFRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Stevenson closed INFRA-2889.
---------------------------------

    Resolution: Won't Fix

Sorry we have left this so long, as a result the logs as to why we would have classified this as spam are now long gone. 
Often mails to the lists are blocked because of: 

- Not sending in plain/text 
- Sending from an RBL listed ip address 
- Using keywords that have a higher spam weighting. 

Regards, 
Tony 

> Solr Mailing List Spam detection
> --------------------------------
>
>                 Key: INFRA-2889
>                 URL: https://issues.apache.org/jira/browse/INFRA-2889
>             Project: Infrastructure
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Mailing Lists
>            Reporter: Mitch
>
> Hello,
> I got a problem, when the Solr Mailing lists declares one of my postings as SPAM.
> Here is the mail I get back from the server:
> <quote>
> This message was created automatically by mail delivery software.
> A message that you sent could not be delivered to one or more of its
> recipients. This is a permanent error. The following address(es) failed:
>   solr-user@lucene.apache.org
>     SMTP error from remote mail server after end of data:
>     host mx1.eu.apache.org [192.87.106.230]: 552 spam score (7.8) exceeded threshold
> ------ This is a copy of the message, including all the headers. ------
> Return-path: <mi...@web.de>
> Received: from ben.nabble.com ([192.168.236.152])
> 	by kuber.nabble.com with esmtp (Exim 4.63)
> 	(envelope-from <mi...@web.de>)
> 	id 1Oak1S-0007kp-SK
> 	for solr-user@lucene.apache.org; Sun, 18 Jul 2010 23:40:18 -0700
> Date: Sun, 18 Jul 2010 23:40:18 -0700 (PDT)
> From: MitchK <mi...@web.de>
> To: solr-user@lucene.apache.org
> Message-ID: <12...@n3.nabble.com>
> Subject: Solr in an extra project, what about replication, scaling, etc.?
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> X-SA-Exim-Connect-IP: 192.168.236.152
> X-SA-Exim-Mail-From: mitch91@web.de
> X-SA-Exim-Scanned: No (on kuber.nabble.com); SAEximRunCond expanded to false
> First of all: This post was still pending since a few days at Nabble, please
> bear with me, if you see it for the second or third time.
> ------
> Hello, 
> during talks with students at my future university, I got some ideas for a
> small student-project (hope this is the correct word). 
> The target of the project is to enrich the search with information, which is
> not accessible by a Lucene index. 
> Unfortunately, this would also be my first "own" JavaEE-project-experience. 
> So I got some questions regarding to Solr and a project, where Solr is a
> part of. 
> The architecture: 
> I got my enriching-engine that does some magic over a search-query. And
> additionally to that I got Solr itself. 
> I want to reduce HTTP-overhead by using embedded Solr in the
> enriching-application. 
> If the enriching-engine has done its work, it directly queries Solr for
> search-results. 
> What would be, if the number of queries becomes so large, that one
> Solr-instance can not handle that? 
> Does embedded Solr support distributed search? 
> Do I really need embedded Solr in this case? 
> Against reducing HTTP-overhead speaks that I think in pictures of a
> distributed environment. So, if I would use Embedded Solr and *if* Embedded
> Solr supports distributed search, each node has to communicate over
> http-requests with eachother - this makes my "optimization" supersede,
> right? Or is this still an usefull optimization? What are your experiences? 
> Another significant reason to use Embedded Solr is:
> When I want to add a few lines to the Solr response by another application,
> I need to parse Solr's response in the application. Since I do not have
> experiences with such a usecase on the large scale, I do not know, whether
> this would make a significant difference on the performance-side.
> If I use Embedded Solr, I hope to avoid parsing a Solr's response, when I
> want to modify it - because I can add whatever I want directly after I do
> something like (System.out.println(Solr's response). I know that it is not
> *that* easy, but it shows the idea. 
> Thank you for help! 
> - Mitch
> -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-in-an-extra-project-what-about-replication-scaling-etc-tp977812p977812.html Sent from the Solr - User mailing list archive at Nabble.com. 
> </quote>
> Suggestions why this happens?
> Thank you.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (INFRA-2889) Solr Mailing List Spam detection

Posted by "Gavin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/INFRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gavin updated INFRA-2889:
-------------------------

    Component/s: Mailing Lists

> Solr Mailing List Spam detection
> --------------------------------
>
>                 Key: INFRA-2889
>                 URL: https://issues.apache.org/jira/browse/INFRA-2889
>             Project: Infrastructure
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Mailing Lists
>            Reporter: Mitch
>
> Hello,
> I got a problem, when the Solr Mailing lists declares one of my postings as SPAM.
> Here is the mail I get back from the server:
> <quote>
> This message was created automatically by mail delivery software.
> A message that you sent could not be delivered to one or more of its
> recipients. This is a permanent error. The following address(es) failed:
>   solr-user@lucene.apache.org
>     SMTP error from remote mail server after end of data:
>     host mx1.eu.apache.org [192.87.106.230]: 552 spam score (7.8) exceeded threshold
> ------ This is a copy of the message, including all the headers. ------
> Return-path: <mi...@web.de>
> Received: from ben.nabble.com ([192.168.236.152])
> 	by kuber.nabble.com with esmtp (Exim 4.63)
> 	(envelope-from <mi...@web.de>)
> 	id 1Oak1S-0007kp-SK
> 	for solr-user@lucene.apache.org; Sun, 18 Jul 2010 23:40:18 -0700
> Date: Sun, 18 Jul 2010 23:40:18 -0700 (PDT)
> From: MitchK <mi...@web.de>
> To: solr-user@lucene.apache.org
> Message-ID: <12...@n3.nabble.com>
> Subject: Solr in an extra project, what about replication, scaling, etc.?
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> X-SA-Exim-Connect-IP: 192.168.236.152
> X-SA-Exim-Mail-From: mitch91@web.de
> X-SA-Exim-Scanned: No (on kuber.nabble.com); SAEximRunCond expanded to false
> First of all: This post was still pending since a few days at Nabble, please
> bear with me, if you see it for the second or third time.
> ------
> Hello, 
> during talks with students at my future university, I got some ideas for a
> small student-project (hope this is the correct word). 
> The target of the project is to enrich the search with information, which is
> not accessible by a Lucene index. 
> Unfortunately, this would also be my first "own" JavaEE-project-experience. 
> So I got some questions regarding to Solr and a project, where Solr is a
> part of. 
> The architecture: 
> I got my enriching-engine that does some magic over a search-query. And
> additionally to that I got Solr itself. 
> I want to reduce HTTP-overhead by using embedded Solr in the
> enriching-application. 
> If the enriching-engine has done its work, it directly queries Solr for
> search-results. 
> What would be, if the number of queries becomes so large, that one
> Solr-instance can not handle that? 
> Does embedded Solr support distributed search? 
> Do I really need embedded Solr in this case? 
> Against reducing HTTP-overhead speaks that I think in pictures of a
> distributed environment. So, if I would use Embedded Solr and *if* Embedded
> Solr supports distributed search, each node has to communicate over
> http-requests with eachother - this makes my "optimization" supersede,
> right? Or is this still an usefull optimization? What are your experiences? 
> Another significant reason to use Embedded Solr is:
> When I want to add a few lines to the Solr response by another application,
> I need to parse Solr's response in the application. Since I do not have
> experiences with such a usecase on the large scale, I do not know, whether
> this would make a significant difference on the performance-side.
> If I use Embedded Solr, I hope to avoid parsing a Solr's response, when I
> want to modify it - because I can add whatever I want directly after I do
> something like (System.out.println(Solr's response). I know that it is not
> *that* easy, but it shows the idea. 
> Thank you for help! 
> - Mitch
> -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-in-an-extra-project-what-about-replication-scaling-etc-tp977812p977812.html Sent from the Solr - User mailing list archive at Nabble.com. 
> </quote>
> Suggestions why this happens?
> Thank you.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.