You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jspwiki.apache.org by Janne Jalkanen <Ja...@ecyrd.com> on 2007/12/25 21:45:29 UTC

The guy's back...

Remember the guy who just posts random words on jspwiki.org,  
presumably to poison bayesian spam filters?

Well, he's back, and he seems to have fixed his system to correct for  
UTF-8 (which was an easy way to catch the bot).

I have to admit that this is no longer fun.  My next idea is to make  
a Javascript-based, automated captcha (if you've got JS on, the  
captcha gets automatically filled, so normally, the user does not  
have to care.  If you've got JS off, then you have to fill it  
yourself.  But after that, I'm running out of ideas.  I would really  
hate to close down jspwiki.org, make registration mandatory, and have  
all registrations moderated.

(Also, the guy is now posting random words, *amended* with random  
snippets of the content of the same page.  Therefore just checking  
for a single word is not possible.)

/Janne

Re: AW: The guy's back...

Posted by Janne Jalkanen <ja...@iki.fi>.
Basically a good idea, but it does not look like any of the addresses
collected by Christoph are in any of the AHBL's block lists (just a
random sampling though, didn't go through them all).  From my own
list, I tried five and got only one positive response.  None of them
were Tor nodes; I think they mostly are from zombie machines.

However, it's been estimated that the bigger botnets only control
maybe 20,000 computers.  So it might be possible to automatically
gather a list of known wikispam addresses (and be subjected to a large
number of DOS attempts after that :-).

We would, in any case, need to develop the SpamFilter a bit more so
that it could return multiple responses and trigger either a direct
rejection or captcha.  Not that it would be that difficult, I'm kinda
working on it already.

/Janne

On Thu, Jan 03, 2008 at 11:12:27AM +0100, Fabian Haupt wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I quickly looked over those addresses. Just dialup ranges. Maybe we
> could use something like the abusive hosts blocking list[1] to identify
> the evil ones?
> 
> [1] http://www.ahbl.org/
> 
> Christoph Sauer wrote:
> > No, not the same subnet. Here's a list I collected...
> > 
> >     <Valve className="org.apache.catalina.valves.RemoteAddrValve"
> >           allow="" deny="60.10.6.170, 60.32.219.68, 60.190.243.173,
> > 60.190.240.76, i61.57.40.31, 66.46.148.201, 74.231.24.2, 80.71.135.2,
> > i81.201.58.55, 83.236.135.140, 87.3.58.149, 122.214.180.254, 122.252.226.40,
> > 161.200.255.162, 200.226.134.53, 201.12.178.33, 202.70.201.34,
> > 203.69.39.251, 207.248.164, 207.248.164.199, 210.17.247.39, 210.73.88.144,
> > 216.32.162.164, 211.7.138.14, 217.149.193.70, 218.58.136.4, 222.221.6.144,
> > 222.190.96.196"/>
> > 
> > -----Ursprüngliche Nachricht-----
> > Von: Fabian Haupt [mailto:kazamatzuri@submerged-intelligence.de] 
> > Gesendet: Donnerstag, 3. Januar 2008 10:28
> > An: jspwiki-dev@incubator.apache.org
> > Betreff: Re: The guy's back...
> > 
> > Just a thought, but maybe he's using something like TOR? Are the IP
> > addresses completely unrelated or more or less in the same subnet? If he
> >  just redialed his line, i figure they would.
> > 
> > A thing we thought of as possibility to defend those, would be to allow
> > tor-edits just through some captchas. So we wouldn't have to shut out
> > tor-users completely, but had some control over spammers (assumed he
> > really is using tor).
> > 
> > But that's just something we came up for the wikipedia-vs-tor problem.
> > 
> > Greets
> > Fabian
> > 
> > Janne Jalkanen wrote:
> >>> Not sure if that's any help, but there'd be no counter to it unless
> >>> the jerk was willing to edit a single page every ten minutes or so in
> >>> some fashion that we couldn't identify as sub-human.
> >> Spambots already do this.  Here are the modification dates from the last
> >> ten attempts by this guy. Each and everyone from a different IP address.
> > 
> >> 2008-01-02 08:56:27
> >> 2008-01-02 13:36:48
> >> 2008-01-02 14:20:05
> >> 2008-01-02 15:46:48
> >> 2008-01-02 15:47:45
> >> 2008-01-02 23:54:31
> >> 2008-01-03 00:18:08
> >> 2008-01-03 00:37:03
> >> 2008-01-03 01:47:51
> >> 2008-01-03 06:49:43
> > 
> >> /Janne
> > 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.8 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAkd8tQsACgkQtC//DIQj2V8hzQCcCMMlqZi7UKgGTZKjpStiPE8J
> lK0AoJX97BfMveOegUbkhy4FAVRo5qiA
> =k2GV
> -----END PGP SIGNATURE-----

Re: AW: The guy's back...

Posted by Fabian Haupt <ka...@submerged-intelligence.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I quickly looked over those addresses. Just dialup ranges. Maybe we
could use something like the abusive hosts blocking list[1] to identify
the evil ones?

[1] http://www.ahbl.org/

Christoph Sauer wrote:
> No, not the same subnet. Here's a list I collected...
> 
>     <Valve className="org.apache.catalina.valves.RemoteAddrValve"
>           allow="" deny="60.10.6.170, 60.32.219.68, 60.190.243.173,
> 60.190.240.76, i61.57.40.31, 66.46.148.201, 74.231.24.2, 80.71.135.2,
> i81.201.58.55, 83.236.135.140, 87.3.58.149, 122.214.180.254, 122.252.226.40,
> 161.200.255.162, 200.226.134.53, 201.12.178.33, 202.70.201.34,
> 203.69.39.251, 207.248.164, 207.248.164.199, 210.17.247.39, 210.73.88.144,
> 216.32.162.164, 211.7.138.14, 217.149.193.70, 218.58.136.4, 222.221.6.144,
> 222.190.96.196"/>
> 
> -----Ursprüngliche Nachricht-----
> Von: Fabian Haupt [mailto:kazamatzuri@submerged-intelligence.de] 
> Gesendet: Donnerstag, 3. Januar 2008 10:28
> An: jspwiki-dev@incubator.apache.org
> Betreff: Re: The guy's back...
> 
> Just a thought, but maybe he's using something like TOR? Are the IP
> addresses completely unrelated or more or less in the same subnet? If he
>  just redialed his line, i figure they would.
> 
> A thing we thought of as possibility to defend those, would be to allow
> tor-edits just through some captchas. So we wouldn't have to shut out
> tor-users completely, but had some control over spammers (assumed he
> really is using tor).
> 
> But that's just something we came up for the wikipedia-vs-tor problem.
> 
> Greets
> Fabian
> 
> Janne Jalkanen wrote:
>>> Not sure if that's any help, but there'd be no counter to it unless
>>> the jerk was willing to edit a single page every ten minutes or so in
>>> some fashion that we couldn't identify as sub-human.
>> Spambots already do this.  Here are the modification dates from the last
>> ten attempts by this guy. Each and everyone from a different IP address.
> 
>> 2008-01-02 08:56:27
>> 2008-01-02 13:36:48
>> 2008-01-02 14:20:05
>> 2008-01-02 15:46:48
>> 2008-01-02 15:47:45
>> 2008-01-02 23:54:31
>> 2008-01-03 00:18:08
>> 2008-01-03 00:37:03
>> 2008-01-03 01:47:51
>> 2008-01-03 06:49:43
> 
>> /Janne
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkd8tQsACgkQtC//DIQj2V8hzQCcCMMlqZi7UKgGTZKjpStiPE8J
lK0AoJX97BfMveOegUbkhy4FAVRo5qiA
=k2GV
-----END PGP SIGNATURE-----

AW: The guy's back...

Posted by Christoph Sauer <sa...@hs-heilbronn.de>.
No, not the same subnet. Here's a list I collected...

    <Valve className="org.apache.catalina.valves.RemoteAddrValve"
          allow="" deny="60.10.6.170, 60.32.219.68, 60.190.243.173,
60.190.240.76, i61.57.40.31, 66.46.148.201, 74.231.24.2, 80.71.135.2,
i81.201.58.55, 83.236.135.140, 87.3.58.149, 122.214.180.254, 122.252.226.40,
161.200.255.162, 200.226.134.53, 201.12.178.33, 202.70.201.34,
203.69.39.251, 207.248.164, 207.248.164.199, 210.17.247.39, 210.73.88.144,
216.32.162.164, 211.7.138.14, 217.149.193.70, 218.58.136.4, 222.221.6.144,
222.190.96.196"/>

-----Ursprüngliche Nachricht-----
Von: Fabian Haupt [mailto:kazamatzuri@submerged-intelligence.de] 
Gesendet: Donnerstag, 3. Januar 2008 10:28
An: jspwiki-dev@incubator.apache.org
Betreff: Re: The guy's back...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just a thought, but maybe he's using something like TOR? Are the IP
addresses completely unrelated or more or less in the same subnet? If he
 just redialed his line, i figure they would.

A thing we thought of as possibility to defend those, would be to allow
tor-edits just through some captchas. So we wouldn't have to shut out
tor-users completely, but had some control over spammers (assumed he
really is using tor).

But that's just something we came up for the wikipedia-vs-tor problem.

Greets
Fabian

Janne Jalkanen wrote:
>> Not sure if that's any help, but there'd be no counter to it unless
>> the jerk was willing to edit a single page every ten minutes or so in
>> some fashion that we couldn't identify as sub-human.
> 
> Spambots already do this.  Here are the modification dates from the last
> ten attempts by this guy. Each and everyone from a different IP address.
> 
> 2008-01-02 08:56:27
> 2008-01-02 13:36:48
> 2008-01-02 14:20:05
> 2008-01-02 15:46:48
> 2008-01-02 15:47:45
> 2008-01-02 23:54:31
> 2008-01-03 00:18:08
> 2008-01-03 00:37:03
> 2008-01-03 01:47:51
> 2008-01-03 06:49:43
> 
> /Janne

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkd8qq4ACgkQtC//DIQj2V96bQCfZxRro26j1ABlYZUEsvONcywG
bR0AnRAgb3Hwt2ds8VsNipLFxiaqfOO0
=lhlx
-----END PGP SIGNATURE-----



Re: The guy's back...

Posted by Fabian Haupt <ka...@submerged-intelligence.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just a thought, but maybe he's using something like TOR? Are the IP
addresses completely unrelated or more or less in the same subnet? If he
 just redialed his line, i figure they would.

A thing we thought of as possibility to defend those, would be to allow
tor-edits just through some captchas. So we wouldn't have to shut out
tor-users completely, but had some control over spammers (assumed he
really is using tor).

But that's just something we came up for the wikipedia-vs-tor problem.

Greets
Fabian

Janne Jalkanen wrote:
>> Not sure if that's any help, but there'd be no counter to it unless
>> the jerk was willing to edit a single page every ten minutes or so in
>> some fashion that we couldn't identify as sub-human.
> 
> Spambots already do this.  Here are the modification dates from the last
> ten attempts by this guy. Each and everyone from a different IP address.
> 
> 2008-01-02 08:56:27
> 2008-01-02 13:36:48
> 2008-01-02 14:20:05
> 2008-01-02 15:46:48
> 2008-01-02 15:47:45
> 2008-01-02 23:54:31
> 2008-01-03 00:18:08
> 2008-01-03 00:37:03
> 2008-01-03 01:47:51
> 2008-01-03 06:49:43
> 
> /Janne

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkd8qq4ACgkQtC//DIQj2V96bQCfZxRro26j1ABlYZUEsvONcywG
bR0AnRAgb3Hwt2ds8VsNipLFxiaqfOO0
=lhlx
-----END PGP SIGNATURE-----

Re: The guy's back...

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
> Not sure if that's any help, but there'd be no counter to it unless
> the jerk was willing to edit a single page every ten minutes or so in
> some fashion that we couldn't identify as sub-human.

Spambots already do this.  Here are the modification dates from the  
last ten attempts by this guy. Each and everyone from a different IP  
address.

2008-01-02 08:56:27
2008-01-02 13:36:48
2008-01-02 14:20:05
2008-01-02 15:46:48
2008-01-02 15:47:45
2008-01-02 23:54:31
2008-01-03 00:18:08
2008-01-03 00:37:03
2008-01-03 01:47:51
2008-01-03 06:49:43

/Janne

Re: The guy's back...

Posted by Murray Altheim <mu...@altheim.com>.
Hi all,

Just got back from climbing some volcanoes. I think another approach
to this is rather than trying to identify a jerk user like this guy
is to try to look at behaviours that are impossible for humans to
perform but typical of bots, such as editing more than one page within
a minute or so. If we could identify those behaviours and then tag
the session as suspect, we could then, rather than simply shutting
down access, let the bot run its course but not save anything, so that
the bot thinks it's done its job but has in actuality had no effect.

Not sure if that's any help, but there'd be no counter to it unless
the jerk was willing to edit a single page every ten minutes or so in
some fashion that we couldn't identify as sub-human.

Murray

...........................................................................
Murray Altheim <murray07 at altheim.com>                           ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

       Boundless wind and moon - the eye within eyes,
       Inexhaustible heaven and earth - the light beyond light,
       The willow dark, the flower bright - ten thousand houses,
       Knock at any door - there's one who will respond.
                                       -- The Blue Cliff Record


Re: The guy's back...

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
> Another more rigorous approach would be to only allow authenticated  
> users to
> edit and comment (so anonymous or asserted users can no longer
> edit/comment), and expand the user registration process with an email
> confirmation step.

Yes, but that will cut down on contributing users; the threshold of  
participation goes way up.  Which I don't really like.

> But looking at the amount of spam-edits done so far, this might give a
> little bit too much collateral damage.

Well, the current statistics is

grey:~/Projects/JSPWiki> cat /var/log/jspwiki/jspwikispam.log* | grep  
ACCEPTED | wc -l
1152
grey:~/Projects/JSPWiki> cat /var/log/jspwiki/jspwikispam.log* | grep  
REJECTED | wc -l
63259

So the spamfilter seems to be mostly working :-)

> (I also had a quick look at the registered users, and it seems to  
> me that
> there are quite a few with invalid email adressess, I think those  
> should be
> removed, what might also be a good idea is an extra attribute that  
> holds the
> last access date of the user, this allows for removing users that  
> haven't
> been used for a long time).

Yup, I've been cleaning those away every now and then.

/Janne

Re: The guy's back...

Posted by Harry Metske <ha...@gmail.com>.
I also think it is too dangerous to offer spammers a dummy jspwiki.org,
chances are that goodwilling editors are loosing their edits in the dummy
wiki.
Another more rigorous approach would be to only allow authenticated users to
edit and comment (so anonymous or asserted users can no longer
edit/comment), and expand the user registration process with an email
confirmation step.

But looking at the amount of spam-edits done so far, this might give a
little bit too much collateral damage.

(I also had a quick look at the registered users, and it seems to me that
there are quite a few with invalid email adressess, I think those should be
removed, what might also be a good idea is an extra attribute that holds the
last access date of the user, this allows for removing users that haven't
been used for a long time).

regards,
Harry

2007/12/26, Janne Jalkanen <Ja...@ecyrd.com>:
>
> > not sure if it possible, but why not let him have hi userid and
> > when he logs in
> > with it, send him to a clone of jspwiki.org, or code up a 200
> > response jst for
> > him but throw away his edits ?
> >
> >
> > I know this is outside normal wiki framework, but might allow the
> > wiki to
> > continue ?
>
> May be possible, but that would need a sure-fire way of recognizing
> the userid.  And, if we can do that, we could probably stop the edits
> right away.
>
> I've been toying with the idea of collecting the IP addresses he
> comes from, and building a "permanent blocklist".  Also, it might be
> cool to have a four-layer option set for edit management:
>
> 1) approve
> 2) send suspect edit to a captcha routine
> 3) send suspect edit to a human to approve
> 4) reject outright
>
> Currently our system is not that fine-grained.
>
> /Janne
>



-- 
met vriendelijke groet,
Harry Metske
Telnr. +31-548-512395
Mobile +31-6-51898081

Re: AW: The guy's back...

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
> I've got a collection of my own already, but at a point I gave up  
> because
> there are always new ip's

Yes.  But you could build it automatically and forward any changes  
from those addresses to moderation.

> What about a shared file that only
> trusted webadmins could access (login for download and ftp upload  
> goes only
> to trusted admins, or maybe a shared hsqldb account?) and that  
> would be used
> by the spamfilter then...?

Yes, that would work.  Until, of course, the spammers would DOS the  
service.

/Janne

AW: The guy's back...

Posted by Christoph Sauer <sa...@hs-heilbronn.de>.
>I've been toying with the idea of collecting the IP addresses he  
>comes from, and building a "permanent blocklist".  

I've got a collection of my own already, but at a point I gave up because
there are always new ip's

What about a shared file that only
trusted webadmins could access (login for download and ftp upload goes only
to trusted admins, or maybe a shared hsqldb account?) and that would be used
by the spamfilter then...?

/Janne



Re: The guy's back...

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
> not sure if it possible, but why not let him have hi userid and  
> when he logs in
> with it, send him to a clone of jspwiki.org, or code up a 200  
> response jst for
> him but throw away his edits ?
>
>
> I know this is outside normal wiki framework, but might allow the  
> wiki to
> continue ?

May be possible, but that would need a sure-fire way of recognizing  
the userid.  And, if we can do that, we could probably stop the edits  
right away.

I've been toying with the idea of collecting the IP addresses he  
comes from, and building a "permanent blocklist".  Also, it might be  
cool to have a four-layer option set for edit management:

1) approve
2) send suspect edit to a captcha routine
3) send suspect edit to a human to approve
4) reject outright

Currently our system is not that fine-grained.

/Janne

Re: The guy's back...

Posted by Alex Samad <al...@samad.com.au>.
On Tue, Dec 25, 2007 at 11:15:41PM +0200, Janne Jalkanen wrote:
>
> No, he's not.  He's using a rotating proxy.  All edits come from different 
> IP addresses.
>
> /Janne
not sure if it possible, but why not let him have hi userid and when he logs in 
with it, send him to a clone of jspwiki.org, or code up a 200 response jst for 
him but throw away his edits ?


I know this is outside normal wiki framework, but might allow the wiki to 
continue ?

Alex

>
> On 25 Dec 2007, at 23:06, Harry Metske wrote:
>
>> and if he's coming from the same IP address every time, can you put an IP
>> ban on it in the meantime ?
>>
>> regards,
>> Harry
>>
>> 2007/12/25, Janne Jalkanen <Ja...@ecyrd.com>:
>>>
>>> Remember the guy who just posts random words on jspwiki.org,
>>> presumably to poison bayesian spam filters?
>>>
>>> Well, he's back, and he seems to have fixed his system to correct for
>>> UTF-8 (which was an easy way to catch the bot).
>>>
>>> I have to admit that this is no longer fun.  My next idea is to make
>>> a Javascript-based, automated captcha (if you've got JS on, the
>>> captcha gets automatically filled, so normally, the user does not
>>> have to care.  If you've got JS off, then you have to fill it
>>> yourself.  But after that, I'm running out of ideas.  I would really
>>> hate to close down jspwiki.org, make registration mandatory, and have
>>> all registrations moderated.
>>>
>>> (Also, the guy is now posting random words, *amended* with random
>>> snippets of the content of the same page.  Therefore just checking
>>> for a single word is not possible.)
>>>
>>> /Janne
>>>
>>
>>
>>
>> -- 
>> met vriendelijke groet,
>> Harry Metske
>> Telnr. +31-548-512395
>> Mobile +31-6-51898081
>
>

Re: The guy's back...

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
No, he's not.  He's using a rotating proxy.  All edits come from  
different IP addresses.

/Janne

On 25 Dec 2007, at 23:06, Harry Metske wrote:

> and if he's coming from the same IP address every time, can you put  
> an IP
> ban on it in the meantime ?
>
> regards,
> Harry
>
> 2007/12/25, Janne Jalkanen <Ja...@ecyrd.com>:
>>
>> Remember the guy who just posts random words on jspwiki.org,
>> presumably to poison bayesian spam filters?
>>
>> Well, he's back, and he seems to have fixed his system to correct for
>> UTF-8 (which was an easy way to catch the bot).
>>
>> I have to admit that this is no longer fun.  My next idea is to make
>> a Javascript-based, automated captcha (if you've got JS on, the
>> captcha gets automatically filled, so normally, the user does not
>> have to care.  If you've got JS off, then you have to fill it
>> yourself.  But after that, I'm running out of ideas.  I would really
>> hate to close down jspwiki.org, make registration mandatory, and have
>> all registrations moderated.
>>
>> (Also, the guy is now posting random words, *amended* with random
>> snippets of the content of the same page.  Therefore just checking
>> for a single word is not possible.)
>>
>> /Janne
>>
>
>
>
> -- 
> met vriendelijke groet,
> Harry Metske
> Telnr. +31-548-512395
> Mobile +31-6-51898081


Re: The guy's back...

Posted by Harry Metske <ha...@gmail.com>.
and if he's coming from the same IP address every time, can you put an IP
ban on it in the meantime ?

regards,
Harry

2007/12/25, Janne Jalkanen <Ja...@ecyrd.com>:
>
> Remember the guy who just posts random words on jspwiki.org,
> presumably to poison bayesian spam filters?
>
> Well, he's back, and he seems to have fixed his system to correct for
> UTF-8 (which was an easy way to catch the bot).
>
> I have to admit that this is no longer fun.  My next idea is to make
> a Javascript-based, automated captcha (if you've got JS on, the
> captcha gets automatically filled, so normally, the user does not
> have to care.  If you've got JS off, then you have to fill it
> yourself.  But after that, I'm running out of ideas.  I would really
> hate to close down jspwiki.org, make registration mandatory, and have
> all registrations moderated.
>
> (Also, the guy is now posting random words, *amended* with random
> snippets of the content of the same page.  Therefore just checking
> for a single word is not possible.)
>
> /Janne
>



-- 
met vriendelijke groet,
Harry Metske
Telnr. +31-548-512395
Mobile +31-6-51898081