You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Aled Jones <Al...@comtec-europe.co.uk> on 2006/04/28 10:58:33 UTC

Heritrix

Hi

Anyone used Heritrix (http://crawler.archive.org/) as a crawler?  How
does it compare with the Nutch crawler?  Can Nutch serve its crawled
results?   Main reason I'm interested is that it has a WUI interface
that might make maintenance for the IT guys easier, although I know that
some of you guys are working on an interface.

Cheers
Aled


###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/
************************************************************************
This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.
 


Re: Heritrix

Posted by Dima Mazmanov <nu...@proservice.ge>.
But admin gui will be in 0.8 version
----- Original Message ----- 
From: "Zaheed Haque" <za...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Friday, April 28, 2006 1:05 PM
Subject: Re: Heritrix


Hi:

Nutch will soon have admin gui thanks to stefan!

http://issues.apache.org/jira/browse/NUTCH-251

Cheers

On 4/28/06, Aled Jones <Al...@comtec-europe.co.uk> wrote:
> Hi
>
> Anyone used Heritrix (http://crawler.archive.org/) as a crawler?  How
> does it compare with the Nutch crawler?  Can Nutch serve its crawled
> results?   Main reason I'm interested is that it has a WUI interface
> that might make maintenance for the IT guys easier, although I know that
> some of you guys are working on an interface.
>
> Cheers
> Aled
>
>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft 
> Exchange.
> For more information, connect to http://www.f-secure.com/
> ************************************************************************
> This e-mail and any attachments are strictly confidential and intended 
> solely for the addressee. They may contain information which is covered by 
> legal, professional or other privilege. If you are not the intended 
> addressee, you must not copy the e-mail or the attachments, or use them 
> for any purpose or disclose their contents to any other person. To do so 
> may be unlawful. If you have received this transmission in error, please 
> notify us as soon as possible and delete the message and attachments from 
> all places in your computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for viruses, it 
> is your responsibility to ensure that they are actually virus free.
>
>
>
>



Re: Heritrix

Posted by Zaheed Haque <za...@gmail.com>.
Hi:

Nutch will soon have admin gui thanks to stefan!

http://issues.apache.org/jira/browse/NUTCH-251

Cheers

On 4/28/06, Aled Jones <Al...@comtec-europe.co.uk> wrote:
> Hi
>
> Anyone used Heritrix (http://crawler.archive.org/) as a crawler?  How
> does it compare with the Nutch crawler?  Can Nutch serve its crawled
> results?   Main reason I'm interested is that it has a WUI interface
> that might make maintenance for the IT guys easier, although I know that
> some of you guys are working on an interface.
>
> Cheers
> Aled
>
>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
> For more information, connect to http://www.f-secure.com/
> ************************************************************************
> This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.
>
>
>
>

RE: Heritrix

Posted by Dan Morrill <ra...@baker.edu>.
Aled,

I used heritrix before going over to nutch, while it is an excellent
program, with lots of good things to offer, it didn't quite meet my need,
and when designing the architecture had too many dependencies for me to be
comfortable with. 

If you want to run an internet archive though, heritrix can not be beat, if
you want to run a search engine, nutch is a good choice. 

My personal opinion. 
r/d

-----Original Message-----
From: Aled Jones [mailto:Aled.Jones@comtec-europe.co.uk] 
Sent: Friday, April 28, 2006 1:59 AM
To: nutch-user@lucene.apache.org
Subject: Heritrix

Hi

Anyone used Heritrix (http://crawler.archive.org/) as a crawler?  How
does it compare with the Nutch crawler?  Can Nutch serve its crawled
results?   Main reason I'm interested is that it has a WUI interface
that might make maintenance for the IT guys easier, although I know that
some of you guys are working on an interface.

Cheers
Aled


###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/
************************************************************************
This e-mail and any attachments are strictly confidential and intended
solely for the addressee. They may contain information which is covered by
legal, professional or other privilege. If you are not the intended
addressee, you must not copy the e-mail or the attachments, or use them for
any purpose or disclose their contents to any other person. To do so may be
unlawful. If you have received this transmission in error, please notify us
as soon as possible and delete the message and attachments from all places
in your computer where they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is
your responsibility to ensure that they are actually virus free.
 



Warning: Con Man

Posted by David Webster <tr...@loxinfo.co.th>.
There is one who posts periodically on this list.  Took my $$$ and
wouldn't deliver.

If you want to know who he is, send me a private email.