You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ad...@interfree.it on 2005/09/29 16:28:59 UTC

problem about the fetch of dinamic page

Hi, I have a question about nutch crawler:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I want to make a document search on a site one that has approached with authentication (user/password).
As soon as fact the login, the first page visualized from the composed application e' from two frame:

<HTML>
<HEAD>
<TITLE>Sistema Provvedimenti -    SUPER</TITLE>
</HEAD
> <FRAMESET ROWS="14%,*">
<FRAME NORESIZE NAME="MENU" SRC="Servlet1?menu=1" SCROLLING="AUTO">
<FRAME NAME="PAGE" SRC="../a.html" SCROLLING="AUTO">
</FRAMESET>
</HTML>

The servlet "Servlet1" publish on web a table with a 1 line and N columns, 
where every column contains a href with the URL of an other servlet (a Servlet2-ServletN).

DESCRIPTION OF THE PROBLEM:

My problem is that I ago see that crawler make the fetch of the page of login, of the static page a.html, of servlet the Servlet1, but not ago fetch of no the other servlet (Servlet2-ServletN).
Instead if I put of the href in the page a.html, Nutch succeeds to make the fetch of the URL and works all.


DESCRIPTION OF OUR CONFIGURATION OF NUTCH:
I installed  Nutch 0.6.  I launch the nutch in this mode:
/usr/nutch-0.6/bin/nutch crawl url -dir index -depth 10 -threads 8 >&
crawl.log

where in the file "url" there is only the url of the sie with just the login and passw

I modified the file of configuration of Nutch "crawl-urlfilter.txt"  like :

-^(ftp|mailto):
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|rtf|zip|ppt|mpg|xls|gz|rpm|tgz|m
ov|MOV|exe)$
+[?&=]
+.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Plese somebody help me!!! It is very important for me

                                                            Adriano Palombo



-------------------------------------------------------------------------
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:

-  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
   email a soli 18,59 euro
-  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email 
   a soli 51,13 euro

Vieni a trovarci!

Lo Staff di Interfree 
-------------------------------------------------------------------------