You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Justin Hartman <jj...@gmail.com> on 2007/01/03 13:39:54 UTC

Google Search on Nutch?

I'm sorry but I have to ask this question - stupid as it may seem....

Why does the Nutch home page [1] have Google Search integrated into
the site when surely it should be using Nutch? What better a
demonstration of the Nutch system than the Nutch home page?
-- 
Regards
Justin Hartman
PGP Key ID: 102CC123

[1] http://lucene.apache.org/nutch/

Re: Google Search on Nutch?

Posted by Lukas Vlcek <lu...@gmail.com>.
Hi,

I am not actively involved in this project so my answer may not be correct
but I would say that having Google integrated search in nutch web is much
easier way to go (and cheaper as well). You don't need to have disk
resources to store nutch index, you don't need to have dedicated admin which
checks/maintains crawler ... etc.

You can search the web/maillists for applications powered by Nutch if you
need references.

Lukas

On 1/3/07, Justin Hartman <jj...@gmail.com> wrote:
>
> I'm sorry but I have to ask this question - stupid as it may seem....
>
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch? What better a
> demonstration of the Nutch system than the Nutch home page?
> --
> Regards
> Justin Hartman
> PGP Key ID: 102CC123
>
> [1] http://lucene.apache.org/nutch/
>

Re: Nutch zone (was Re: Google Search on Nutch?)

Posted by Jim Wilson <wi...@gmail.com>.
> However nutch could use their zone server to add a demonstration for the
> community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
> http://forrest.zones.apache.org/, but I guess that is a dev topic.

+1

Is there a reason not to use the zone server in this manner?

-- Jim

On 1/11/07, Thorsten Scherler <th...@juntadeandalucia.es>
wrote:
>
> On Wed, 2007-01-03 at 14:39 +0200, Justin Hartman wrote:
> > I'm sorry but I have to ask this question - stupid as it may seem....
> >
> > Why does the Nutch home page [1] have Google Search integrated into
> > the site when surely it should be using Nutch?
>
> See the source code of the page:
> <meta content="Apache Forrest" name="Generator">
> <meta name="Forrest-version" content="0.7">
> <meta name="Forrest-skin-name" content="pelt">
>
> Forrest has not yet a explicit nutch search interface,
> however forrest support searching against a lucene index out of the box.
>
> Further I just added a solr plugin to forrest, trying to say forrest
> could be easily extended to use nutch but ...
>
> > What better a
> > demonstration of the Nutch system than the Nutch home page?
>
> Well, on people.apache.org, where all websites of http://apache.org/ are
> hosted there is no java allowed ASAIR.
>
> However nutch could use their zone server to add a demonstration for the
> community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
> http://forrest.zones.apache.org/, but I guess that is a dev topic.
>
> HTH
>
> salu2
>
> --
> thorsten
>
> "Together we stand, divided we fall!"
> Hey you (Pink Floyd)
>
>

Nutch zone (was Re: Google Search on Nutch?)

Posted by Thorsten Scherler <th...@juntadeandalucia.es>.
On Wed, 2007-01-03 at 14:39 +0200, Justin Hartman wrote:
> I'm sorry but I have to ask this question - stupid as it may seem....
> 
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch? 

See the source code of the page:
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.7">
<meta name="Forrest-skin-name" content="pelt">

Forrest has not yet a explicit nutch search interface, 
however forrest support searching against a lucene index out of the box.

Further I just added a solr plugin to forrest, trying to say forrest 
could be easily extended to use nutch but ...

> What better a
> demonstration of the Nutch system than the Nutch home page?

Well, on people.apache.org, where all websites of http://apache.org/ are
hosted there is no java allowed ASAIR. 

However nutch could use their zone server to add a demonstration for the
community. Like e.g. http://lenya.zones.apache.org/#otherDemos or
http://forrest.zones.apache.org/, but I guess that is a dev topic.

HTH

salu2

-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Re: Google Search on Nutch?

Posted by Zaheed Haque <za...@gmail.com>.
Follow the whole thread..

http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200601.mbox/%3c43D7EC8A.9050506@nutch.org%3e

This is already in progress as far as I know.. It is just not completed yet ..

Cheers

On 1/4/07, Nitin Borwankar <ni...@borwankar.com> wrote:
> Phillip Rhodes wrote:
>
> > Not to be a wet towel, but never heard or saw apache running a java
> > process on any of their servers.  Feel free to prove me wrong!
> >
> > My idea to make this happen would be that some of us could "donate"
> > the use of a server/bandwidth to run nutch.
> > What would  be "super ultra cool" is that a few of us donate a
> > dedicated JVM to run the nutch indexer/webapp and we could round robin
> > between them.  I will be the first to volunteer my resources for this
> > purpose.  I have a co-location, and a commercial SDSL connection with
> > 2 racks of servers...
>
>
> Glad to do the same - I have an instance on Amazon EC2.  I have already
> experimented with single site indexing for some friends.
>
> Nitin
>
>
>
> --
> Nitin Borwankar
> Find, Learn, Act ....
> Greener, the search engine for the planet
> http://greener.com
> nitin@borwankar.com
> 510-872-7066
>
>

Re: Google Search on Nutch?

Posted by Nitin Borwankar <ni...@borwankar.com>.
Phillip Rhodes wrote:

> Not to be a wet towel, but never heard or saw apache running a java 
> process on any of their servers.  Feel free to prove me wrong!
>
> My idea to make this happen would be that some of us could "donate" 
> the use of a server/bandwidth to run nutch.
> What would  be "super ultra cool" is that a few of us donate a 
> dedicated JVM to run the nutch indexer/webapp and we could round robin 
> between them.  I will be the first to volunteer my resources for this 
> purpose.  I have a co-location, and a commercial SDSL connection with 
> 2 racks of servers...


Glad to do the same - I have an instance on Amazon EC2.  I have already 
experimented with single site indexing for some friends.

Nitin



-- 
Nitin Borwankar
Find, Learn, Act .... 
Greener, the search engine for the planet
http://greener.com
nitin@borwankar.com
510-872-7066


Re: Google Search on Nutch?

Posted by Phillip Rhodes <sp...@rhoderunner.com>.
Not to be a wet towel, but never heard or saw apache running a java 
process on any of their servers.  Feel free to prove me wrong!

My idea to make this happen would be that some of us could "donate" the 
use of a server/bandwidth to run nutch. 

What would  be "super ultra cool" is that a few of us donate a dedicated 
JVM to run the nutch indexer/webapp and we could round robin between 
them.  I will be the first to volunteer my resources for this purpose.  
I have a co-location, and a commercial SDSL connection with 2 racks of 
servers...



Phillip

Andrzej Bialecki wrote:

> Jim Wilson wrote:
>
>> Andrezej Bialecki wrote:
>>
>>> ... familiar enough with Apache infrastructure in order to set it up
>>
>>
>> Could you expand a little on the experience needed?  Is this in 
>> regards to
>> tying Apache to Tomcat?  (I remember ages ago this used to be done with
>> something called "mod_jk" but who knows anymore).
>
>
> No, I meant the apache.org as a person (a committer), who is familiar 
> enough with both Nutch and the local infrastructure at apache.org so 
> that he could set it up.
>



Re: Google Search on Nutch?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Jim Wilson wrote:
> Andrezej Bialecki wrote:
>> ... familiar enough with Apache infrastructure in order to set it up
>
> Could you expand a little on the experience needed?  Is this in 
> regards to
> tying Apache to Tomcat?  (I remember ages ago this used to be done with
> something called "mod_jk" but who knows anymore).

No, I meant the apache.org as a person (a committer), who is familiar 
enough with both Nutch and the local infrastructure at apache.org so 
that he could set it up.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Google Search on Nutch?

Posted by Jim Wilson <wi...@gmail.com>.
Andrezej Bialecki wrote:
> ... familiar enough with Apache infrastructure in order to set it up

Could you expand a little on the experience needed?  Is this in regards to
tying Apache to Tomcat?  (I remember ages ago this used to be done with
something called "mod_jk" but who knows anymore).

Maybe someone already has a Nutch server pointed to the Nutch website and
restricted to that domain?

-- Jim

On 1/3/07, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Josef Novak wrote:
> > Asked the same thing myself some time ago, but never got a response.
> > Thought it was a half-decent question though.
>
> Yes, it's a valid question :) The truth is that there are too few
> developer resources familiar enough with Apache infrastructure in order
> to set it up. Nutch is more than capable of doing this, all it takes is
> one person familiar with the infrastructure & the nightly build process,
> and with a day or two to spare ...
>
> --
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

Re: Google Search on Nutch?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Josef Novak wrote:
> Asked the same thing myself some time ago, but never got a response.
> Thought it was a half-decent question though.

Yes, it's a valid question :) The truth is that there are too few 
developer resources familiar enough with Apache infrastructure in order 
to set it up. Nutch is more than capable of doing this, all it takes is 
one person familiar with the infrastructure & the nightly build process, 
and with a day or two to spare ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Google Search on Nutch?

Posted by Josef Novak <jo...@gmail.com>.
Asked the same thing myself some time ago, but never got a response.
Thought it was a half-decent question though.

On 1/3/07, Justin Hartman <jj...@gmail.com> wrote:
>
> I'm sorry but I have to ask this question - stupid as it may seem....
>
> Why does the Nutch home page [1] have Google Search integrated into
> the site when surely it should be using Nutch? What better a
> demonstration of the Nutch system than the Nutch home page?
> --
> Regards
> Justin Hartman
> PGP Key ID: 102CC123
>
> [1] http://lucene.apache.org/nutch/
>