You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zhou LiBing <zh...@gmail.com> on 2005/03/03 15:43:36 UTC

Re: [Nutch-general] Request for Simple Tutorial

thanks,Olaf

I will try


On Tue, 1 Mar 2005 13:17:58 +0100, Olaf Thiele <ol...@gmail.com> wrote:
> Hi Zhou,
> sounds like DMOZ is not as bad an option as you
> said. Why don't you use it as a starting point for
> searching images.
> 
> But please keep in mind, that Nutch does not crawl
> images by default. I would suggest, you do the following:
> 
> 1. start using Nutch with default values and classical text search
> 2. grow your index and become comfortable with Nutch
> 2. look at plug-ins for dealing with content (rtf, pdf, ...)
> 3. build your own plug-in for dealing with images (to extract size, ...)
> 
> Kind regards,
> Olaf
> 
> 
> On Tue, 1 Mar 2005 20:03:18 +0800, Zhou LiBing <zh...@gmail.com> wrote:
> > Ijust want to finish  image search engine,my team has six graduate
> > ,and my job is collect the resources,such as image,text and etc.
> >
> > Do you have some suggestions about this ?
> >
> > Thanks anyway!
> >
> >
> > On Tue, 1 Mar 2005 08:05:06 +0100, Olaf Thiele <ol...@gmail.com> wrote:
> > > Hi,
> > > if you want to build an index with 100 million pages, I recommend
> > > Thompson's rule for first-time telescope makers:
> > > It is faster to make a four-inch mirror then a six-inch mirror than to
> > > make a six-inch mirror (http://www.javaranch.com/granny.jsp).
> > >
> > > For more information on a big index, read the following thread:
> > > http://sourceforge.net/mailarchive/message.php?msg_id=10163623
> > >
> > > And for the second question, if you are not using DMOZ data,
> > > you will need to find your own. WHAT do you want to index?
> > > There must be a reason for you to build a search engine.
> > >
> > > Kind regards,
> > > Olaf
> > >
> > > On Tue, 1 Mar 2005 09:37:17 +0800, Zhou LiBing <zh...@gmail.com> wrote:
> > > > If Idonot use the DMOZ data,How could I complete the search engine
> > > >
> > > >
> > > > On Mon, 28 Feb 2005 18:14:33 -0600, Ivaylo Georgiev <iv...@esite.com> wrote:
> > > > >
> > > > >
> > > > > I just ran the tutorial and read about hardware requirements for running
> > > > > Nutch.
> > > > >
> > > > > I have some questions. What does it mean "search nodes"?
> > > > >
> > > > > Assume I want to index 100 million pages and I have 5 machines to use as
> > > > > search nodes - how these search nodes must be built – what part of Nutch
> > > > > must reside on these machines?
> > > > >
> > > > >
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Ivo
> > > >
> > > > --
> > > > ---Letter From your friend Blue at HUST CGCL---
> > > >
> > > > -------------------------------------------------------
> > > > SF email is sponsored by - The IT Product Guide
> > > > Read honest & candid reviews on hundreds of IT Products from real users.
> > > > Discover which products truly live up to the hype. Start reading now.
> > > > http://ads.osdn.com/?ad_ide95&alloc_id396&opclick
> > > > _______________________________________________
> > > > Nutch-general mailing list
> > > > Nutch-general@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/nutch-general
> > > >
> > >
> > > --
> > >
> > > <SimpleHuman gender="male">
> > >   <Physical name="Olaf Thiele" />
> > >   <Virtual adress="http://www.olafthiele.de" />
> > > </SimpleHuman>
> > >
> > > -------------------------------------------------------
> > > SF email is sponsored by - The IT Product Guide
> > > Read honest & candid reviews on hundreds of IT Products from real users.
> > > Discover which products truly live up to the hype. Start reading now.
> > > http://ads.osdn.com/?ad_ide95&alloc_id396&opclick
> > > _______________________________________________
> > > Nutch-general mailing list
> > > Nutch-general@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/nutch-general
> > >
> >
> > --
> > ---Letter From your friend Blue at HUST CGCL---
> >
> > -------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT Products from real users.
> > Discover which products truly live up to the hype. Start reading now.
> > http://ads.osdn.com/?ad_ide95&alloc_id396&opclick
> > _______________________________________________
> > Nutch-general mailing list
> > Nutch-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
> >
> 
> --
> 
> <SimpleHuman gender="male">
>   <Physical name="Olaf Thiele" />
>   <Virtual adress="http://www.olafthiele.de" />
> </SimpleHuman>
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id396&opclick
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 


-- 
---Letter From your friend Blue at HUST CGCL---