You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ps1c5o <rf...@hotmail.com> on 2008/07/04 01:17:39 UTC

deducing web crawler behavior from access.log files

I dont know if this is the right place but... if not, sry.

ike the title says i need to be able to deduce web crawler behavior from the
access log.
In particular, i need to understand what this means:

xx.xx.xx.x - - [12/Jun/2008:21:10:31 +0100] "GET /phpmyadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.x.xx - - [12/Jun/2008:21:10:31 +0100] "GET /phpMyAdmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:31 +0100] "GET /db/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /web/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /PMA/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /admin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /dbadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /PMA2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /pma2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /sqlmanager/main.php
HTTP/1.0" 404 1123 "-" "-"


where i replaced the ip for x's for privacy sake.

this is just an extract... there are probably over 200 lines similar to
those where the crawler tries to get main.php file from hundreds of
different file paths, most including some folder named phpmyadmin or
similar.

Is this an attempt to attack the machine? Why does he want the main.php file
so bad?

thnx in advance
-- 
View this message in context: http://www.nabble.com/deducing-web-crawler-behavior-from-access.log-files-tp18269957p18269957.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: deducing web crawler behavior from access.log files

Posted by Kunthar <ku...@gmail.com>.
Not a right place to ask :)

Basically this is http web scan to find weak web holes. And yes, this is an
attack.

Check http://packetstormsecurity.org/  and  http://www.milw0rm.com/

Peace,
Kunth



On Fri, Jul 4, 2008 at 2:18 AM, ps1c5o <rf...@hotmail.com> wrote:

>
> I dont know if this is the right place to post this but... if not, sry.
>
> ike the title says i need to be able to deduce web crawler behavior from
> the
> access log.
> In particular, i need to understand what this means:
>
> xx.xx.xx.x - - [12/Jun/2008:21:10:31 +0100] "GET /phpmyadmin/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xx.xx.x.xx - - [12/Jun/2008:21:10:31 +0100] "GET /phpMyAdmin/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:31 +0100] "GET /db/main.php HTTP/1.0"
> 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /web/main.php HTTP/1.0"
> 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /PMA/main.php HTTP/1.0"
> 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /admin/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /dbadmin/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /PMA2006/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xxx.xxx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /pma2006/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
> xx.xx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /sqlmanager/main.php
> HTTP/1.0" 404 1123 "-" "-"
>
>
> where i replaced the ip for x's for privacy sake.
>
> this is just an extract... there are probably over 200 lines similar to
> those where the crawler tries to get main.php file from hundreds of
> different file paths, most including some folder named phpmyadmin or
> similar.
>
> Is this an attempt to attack the machine? Why does he want the main.php
> file
> so bad?
>
> thnx in advance
> --
> View this message in context:
> http://www.nabble.com/deducing-web-crawler-behavior-from-access.log-files-tp18269957p18269957.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>