You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/04/28 15:23:41 UTC
Admin Gui beta test (was Re: ATB: Heritrix)
Hi there,
since building the gui is some how complicated I was thinking about
providing a ready to use binary.
This may be would help to get some more beta testers we currently
looking for.
Any thoughts?
However I afraid that this would hit my server to hard and I have to
pay for traffic. :-/
Does any one has an idea where we can mirror this file for free?
Any volunteer is very welcome.
Thanks.
Stefan
Am 28.04.2006 um 15:14 schrieb Aled Jones:
> Thanks for your replies guys. I hadn't realised that the admin gui
> was
> already in development.
> We should be able to cope till it gets released ;-)
>
> Thanks again
> Aled
>
>> -----Neges Wreiddiol-----/-----Original Message-----
>> Oddi wrth/From: Dan Morrill [mailto:ralph.morrill@baker.edu]
>> Anfonwyd/Sent: 28 April 2006 14:07
>> At/To: nutch-user@lucene.apache.org
>> Pwnc/Subject: RE: Heritrix
>>
>> Aled,
>>
>> I used heritrix before going over to nutch, while it is an
>> excellent program, with lots of good things to offer, it
>> didn't quite meet my need, and when designing the
>> architecture had too many dependencies for me to be comfortable with.
>>
>> If you want to run an internet archive though, heritrix can
>> not be beat, if you want to run a search engine, nutch is a
>> good choice.
>>
>> My personal opinion.
>> r/d
>>
>> -----Original Message-----
>> From: Aled Jones [mailto:Aled.Jones@comtec-europe.co.uk]
>> Sent: Friday, April 28, 2006 1:59 AM
>> To: nutch-user@lucene.apache.org
>> Subject: Heritrix
>>
>> Hi
>>
>> Anyone used Heritrix (http://crawler.archive.org/) as a
>> crawler? How does it compare with the Nutch crawler? Can
>> Nutch serve its crawled
>> results? Main reason I'm interested is that it has a WUI interface
>> that might make maintenance for the IT guys easier, although
>> I know that some of you guys are working on an interface.
>>
>> Cheers
>> Aled
>>
>>
>> ###########################################
>>
>> This message has been scanned by F-Secure Anti-Virus for
>> Microsoft Exchange.
>> For more information, connect to http://www.f-secure.com/
>> **************************************************************
>> **********
>> This e-mail and any attachments are strictly confidential and
>> intended solely for the addressee. They may contain
>> information which is covered by legal, professional or other
>> privilege. If you are not the intended addressee, you must
>> not copy the e-mail or the attachments, or use them for any
>> purpose or disclose their contents to any other person. To do
>> so may be unlawful. If you have received this transmission in
>> error, please notify us as soon as possible and delete the
>> message and attachments from all places in your computer
>> where they are stored.
>>
>> Although we have scanned this e-mail and any attachments for
>> viruses, it is your responsibility to ensure that they are
>> actually virus free.
>>
>>
>>
>>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft
> Exchange.
> For more information, connect to http://www.f-secure.com/
>
> **********************************************************************
> **
> This e-mail and any attachments are strictly confidential and
> intended solely for the addressee. They may contain information
> which is covered by legal, professional or other privilege. If you
> are not the intended addressee, you must not copy the e-mail or the
> attachments, or use them for any purpose or disclose their contents
> to any other person. To do so may be unlawful. If you have received
> this transmission in error, please notify us as soon as possible
> and delete the message and attachments from all places in your
> computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for
> viruses, it is your responsibility to ensure that they are actually
> virus free.
>
>
>
---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.
I'd prefer not to make a longterm committment, but if a 2mbit connection
is good enough and this is a short term thing, I'll step up if no one
else can. I could probably make a longer term committment in a few
weeks. Worst case I can host it at home.
glenn
gekkokid wrote:
> what about putting it on sourceforge?
> http://sourceforge.net/projects/nutch
>
>
>
> ----- Original Message ----- From: "Doug Cutting" <cu...@apache.org>
> To: <nu...@lucene.apache.org>
> Sent: Saturday, April 29, 2006 12:18 AM
> Subject: Re: Admin Gui beta test (was Re: ATB: Heritrix)
>
>
>> Stefan Groschupf wrote:
>>
>>>> If that fails, you could attach it to a page on the Wiki, no?
>>>
>>>
>>> Is that a good idea? The file is that big and we already got the
>>> request to use the apache mirror servers.
>>
>>
>> The Apache mirrors are really for signed, Apache releases, which this
>> is not. Is it too big for the wiki?
>>
>> Doug
>>
>
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by gekkokid <me...@gekkokid.org.uk>.
what about putting it on sourceforge?
http://sourceforge.net/projects/nutch
----- Original Message -----
From: "Doug Cutting" <cu...@apache.org>
To: <nu...@lucene.apache.org>
Sent: Saturday, April 29, 2006 12:18 AM
Subject: Re: Admin Gui beta test (was Re: ATB: Heritrix)
> Stefan Groschupf wrote:
>>> If that fails, you could attach it to a page on the Wiki, no?
>>
>> Is that a good idea? The file is that big and we already got the
>> request to use the apache mirror servers.
>
> The Apache mirrors are really for signed, Apache releases, which this is
> not. Is it too big for the wiki?
>
> Doug
>
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Doug Cutting <cu...@apache.org>.
Stefan Groschupf wrote:
>> If that fails, you could attach it to a page on the Wiki, no?
>
> Is that a good idea? The file is that big and we already got the
> request to use the apache mirror servers.
The Apache mirrors are really for signed, Apache releases, which this is
not. Is it too big for the wiki?
Doug
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by gekkokid <me...@gekkokid.org.uk>.
use yousendit.com and post the link here :)
----- Original Message -----
From: "Karsten Dello" <de...@mi.fu-berlin.de>
To: <nu...@lucene.apache.org>
Sent: Tuesday, May 02, 2006 8:50 PM
Subject: Re: Admin Gui beta test (was Re: ATB: Heritrix)
> Hi Stefan,
>
> did you find a solution?
> I'd really like to give the admin gui a try.
>
> Cheers
>
> Karsten
>
>
> PS: My offer to host that file is still open :-)
>
> Stefan Groschupf schrieb:
>
>>
>>>> I think it should be possible to put your binary at the Apache site,
>>>> probably Doug will be the right person to talk to ...
>>>
>>>
>>> Have you tried attaching it to a Jira issue?
>>
>> The nutch -xxxtar.gz is 67MB. The maximum file upload size is 10.00 Mb .
>>
>>>
>>> If that fails, you could attach it to a page on the Wiki, no?
>>
>> Is that a good idea? The file is that big and we already got the request
>> to use the apache mirror servers.
>> Anyway I already got some offline offers from people, just was thinking
>> it is a good idea to leave that running under the nutch project flag.
>>
>>
>>
>>
>>
>>
>>
>
>
>
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Karsten Dello <de...@mi.fu-berlin.de>.
Hi Stefan,
did you find a solution?
I'd really like to give the admin gui a try.
Cheers
Karsten
PS: My offer to host that file is still open :-)
Stefan Groschupf schrieb:
>
>>> I think it should be possible to put your binary at the Apache site,
>>> probably Doug will be the right person to talk to ...
>>
>>
>> Have you tried attaching it to a Jira issue?
>
> The nutch -xxxtar.gz is 67MB. The maximum file upload size is 10.00 Mb .
>
>>
>> If that fails, you could attach it to a page on the Wiki, no?
>
> Is that a good idea? The file is that big and we already got the
> request to use the apache mirror servers.
> Anyway I already got some offline offers from people, just was thinking
> it is a good idea to leave that running under the nutch project flag.
>
>
>
>
>
>
>
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Stefan Groschupf <sg...@media-style.com>.
>> I think it should be possible to put your binary at the Apache
>> site, probably Doug will be the right person to talk to ...
>
> Have you tried attaching it to a Jira issue?
The nutch -xxxtar.gz is 67MB. The maximum file upload size is 10.00 Mb .
>
> If that fails, you could attach it to a page on the Wiki, no?
Is that a good idea? The file is that big and we already got the
request to use the apache mirror servers.
Anyway I already got some offline offers from people, just was
thinking it is a good idea to leave that running under the nutch
project flag.
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Doug Cutting <cu...@apache.org>.
Andrzej Bialecki wrote:
> I think it should be possible to put your binary at the Apache site,
> probably Doug will be the right person to talk to ...
Have you tried attaching it to a Jira issue?
If that fails, you could attach it to a page on the Wiki, no?
Doug
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:
> Hi there,
>
> since building the gui is some how complicated I was thinking about
> providing a ready to use binary.
> This may be would help to get some more beta testers we currently
> looking for.
> Any thoughts?
>
> However I afraid that this would hit my server to hard and I have to
> pay for traffic. :-/
> Does any one has an idea where we can mirror this file for free?
> Any volunteer is very welcome.
I think it should be possible to put your binary at the Apache site,
probably Doug will be the right person to talk to ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
RE: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by Dan Morrill <ra...@baker.edu>.
Stefan -
I can host the file at http://www.oaktreesecurity.com if you would like. I
have about 2 gigs of bandwidth a month, and I use maybe 10 megs, I think I
can accommodate. I am more than happy to host a free standing binary.
Do you have a windows compatible version (or will it run in cygwin), or is
it Linux only?
r/d
-----Original Message-----
From: Stefan Groschupf [mailto:sg@media-style.com]
Sent: Friday, April 28, 2006 6:24 AM
To: nutch-user@lucene.apache.org
Subject: Admin Gui beta test (was Re: ATB: Heritrix)
Hi there,
since building the gui is some how complicated I was thinking about
providing a ready to use binary.
This may be would help to get some more beta testers we currently
looking for.
Any thoughts?
However I afraid that this would hit my server to hard and I have to
pay for traffic. :-/
Does any one has an idea where we can mirror this file for free?
Any volunteer is very welcome.
Thanks.
Stefan
Am 28.04.2006 um 15:14 schrieb Aled Jones:
> Thanks for your replies guys. I hadn't realised that the admin gui
> was
> already in development.
> We should be able to cope till it gets released ;-)
>
> Thanks again
> Aled
>
>> -----Neges Wreiddiol-----/-----Original Message-----
>> Oddi wrth/From: Dan Morrill [mailto:ralph.morrill@baker.edu]
>> Anfonwyd/Sent: 28 April 2006 14:07
>> At/To: nutch-user@lucene.apache.org
>> Pwnc/Subject: RE: Heritrix
>>
>> Aled,
>>
>> I used heritrix before going over to nutch, while it is an
>> excellent program, with lots of good things to offer, it
>> didn't quite meet my need, and when designing the
>> architecture had too many dependencies for me to be comfortable with.
>>
>> If you want to run an internet archive though, heritrix can
>> not be beat, if you want to run a search engine, nutch is a
>> good choice.
>>
>> My personal opinion.
>> r/d
>>
>> -----Original Message-----
>> From: Aled Jones [mailto:Aled.Jones@comtec-europe.co.uk]
>> Sent: Friday, April 28, 2006 1:59 AM
>> To: nutch-user@lucene.apache.org
>> Subject: Heritrix
>>
>> Hi
>>
>> Anyone used Heritrix (http://crawler.archive.org/) as a
>> crawler? How does it compare with the Nutch crawler? Can
>> Nutch serve its crawled
>> results? Main reason I'm interested is that it has a WUI interface
>> that might make maintenance for the IT guys easier, although
>> I know that some of you guys are working on an interface.
>>
>> Cheers
>> Aled
>>
>>
>> ###########################################
>>
>> This message has been scanned by F-Secure Anti-Virus for
>> Microsoft Exchange.
>> For more information, connect to http://www.f-secure.com/
>> **************************************************************
>> **********
>> This e-mail and any attachments are strictly confidential and
>> intended solely for the addressee. They may contain
>> information which is covered by legal, professional or other
>> privilege. If you are not the intended addressee, you must
>> not copy the e-mail or the attachments, or use them for any
>> purpose or disclose their contents to any other person. To do
>> so may be unlawful. If you have received this transmission in
>> error, please notify us as soon as possible and delete the
>> message and attachments from all places in your computer
>> where they are stored.
>>
>> Although we have scanned this e-mail and any attachments for
>> viruses, it is your responsibility to ensure that they are
>> actually virus free.
>>
>>
>>
>>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft
> Exchange.
> For more information, connect to http://www.f-secure.com/
>
> **********************************************************************
> **
> This e-mail and any attachments are strictly confidential and
> intended solely for the addressee. They may contain information
> which is covered by legal, professional or other privilege. If you
> are not the intended addressee, you must not copy the e-mail or the
> attachments, or use them for any purpose or disclose their contents
> to any other person. To do so may be unlawful. If you have received
> this transmission in error, please notify us as soon as possible
> and delete the message and attachments from all places in your
> computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for
> viruses, it is your responsibility to ensure that they are actually
> virus free.
>
>
>
---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com
Re: Admin Gui beta test (was Re: ATB: Heritrix)
Posted by sudhendra seshachala <su...@yahoo.com>.
Hi Stefan
I would be willing to host the app.
I have virutal dedicated server from Godaddy with Fedora core2 and apache webserver and tomcat running.
The IP address is http://68.178.249.66 Right now, on webserver side, I have a default page (hosted by godaddy running)
But can make sure the Admin GUI is running.. I might need some help, but should not be a problem at all.
Thanks
Sudhi
Stefan Groschupf <sg...@media-style.com> wrote:
Hi there,
since building the gui is some how complicated I was thinking about
providing a ready to use binary.
This may be would help to get some more beta testers we currently
looking for.
Any thoughts?
However I afraid that this would hit my server to hard and I have to
pay for traffic. :-/
Does any one has an idea where we can mirror this file for free?
Any volunteer is very welcome.
Thanks.
Stefan
Am 28.04.2006 um 15:14 schrieb Aled Jones:
> Thanks for your replies guys. I hadn't realised that the admin gui
> was
> already in development.
> We should be able to cope till it gets released ;-)
>
> Thanks again
> Aled
>
>> -----Neges Wreiddiol-----/-----Original Message-----
>> Oddi wrth/From: Dan Morrill [mailto:ralph.morrill@baker.edu]
>> Anfonwyd/Sent: 28 April 2006 14:07
>> At/To: nutch-user@lucene.apache.org
>> Pwnc/Subject: RE: Heritrix
>>
>> Aled,
>>
>> I used heritrix before going over to nutch, while it is an
>> excellent program, with lots of good things to offer, it
>> didn't quite meet my need, and when designing the
>> architecture had too many dependencies for me to be comfortable with.
>>
>> If you want to run an internet archive though, heritrix can
>> not be beat, if you want to run a search engine, nutch is a
>> good choice.
>>
>> My personal opinion.
>> r/d
>>
>> -----Original Message-----
>> From: Aled Jones [mailto:Aled.Jones@comtec-europe.co.uk]
>> Sent: Friday, April 28, 2006 1:59 AM
>> To: nutch-user@lucene.apache.org
>> Subject: Heritrix
>>
>> Hi
>>
>> Anyone used Heritrix (http://crawler.archive.org/) as a
>> crawler? How does it compare with the Nutch crawler? Can
>> Nutch serve its crawled
>> results? Main reason I'm interested is that it has a WUI interface
>> that might make maintenance for the IT guys easier, although
>> I know that some of you guys are working on an interface.
>>
>> Cheers
>> Aled
>>
>>
>> ###########################################
>>
>> This message has been scanned by F-Secure Anti-Virus for
>> Microsoft Exchange.
>> For more information, connect to http://www.f-secure.com/
>> **************************************************************
>> **********
>> This e-mail and any attachments are strictly confidential and
>> intended solely for the addressee. They may contain
>> information which is covered by legal, professional or other
>> privilege. If you are not the intended addressee, you must
>> not copy the e-mail or the attachments, or use them for any
>> purpose or disclose their contents to any other person. To do
>> so may be unlawful. If you have received this transmission in
>> error, please notify us as soon as possible and delete the
>> message and attachments from all places in your computer
>> where they are stored.
>>
>> Although we have scanned this e-mail and any attachments for
>> viruses, it is your responsibility to ensure that they are
>> actually virus free.
>>
>>
>>
>>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft
> Exchange.
> For more information, connect to http://www.f-secure.com/
>
> **********************************************************************
> **
> This e-mail and any attachments are strictly confidential and
> intended solely for the addressee. They may contain information
> which is covered by legal, professional or other privilege. If you
> are not the intended addressee, you must not copy the e-mail or the
> attachments, or use them for any purpose or disclose their contents
> to any other person. To do so may be unlawful. If you have received
> this transmission in error, please notify us as soon as possible
> and delete the message and attachments from all places in your
> computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for
> viruses, it is your responsibility to ensure that they are actually
> virus free.
>
>
>
---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com
Sudhi Seshachala
http://sudhilogs.blogspot.com/
---------------------------------
Yahoo! Mail goes everywhere you do. Get it on your phone.