You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hans Benedict <be...@chemie.de> on 2005/06/17 08:50:56 UTC

meta-tags not indexed?

Hi,

as far as I can see, nutch does not index any html meta-tags like 
description or keywords. Does anybody know the reason for this?

-- 

Regards,
Hans Benedict

_________________________________________________________________
Chemie.DE Information Service GmbH     Hans Benedict
Seydelstraße 28                        mailto: benedict@chemie.de
10117 Berlin, Germany                  Tel +49 30 204568-40
                                       Fax +49 30 204568-70

www.Chemie.DE               |          www.ChemieKarriere.NET   
www.Bionity.COM             |          www.BioKarriere.NET 


pointers regarding searching

Posted by Emilijan Mirceski <em...@cpuedge.com>.
Hi,

What do you guys usually use as a development?

Can anyone recommend me a tomcat + ant hosting company?
I'm looking for up to 600 MB space, preferably in the cheap range.


Thanks,
Emilijan


Re: meta-tags not indexed?

Posted by Jack Tang <hi...@gmail.com>.
http://issues.apache.org/jira/browse/NUTCH-62?page=all

/Jack

On 6/17/05, Hans Benedict <be...@chemie.de> wrote:
> Hi,
> 
> as far as I can see, nutch does not index any html meta-tags like
> description or keywords. Does anybody know the reason for this?
> 
> --
> 
> Regards,
> Hans Benedict
> 
> _________________________________________________________________
> Chemie.DE Information Service GmbH     Hans Benedict
> Seydelstraße 28                        mailto: benedict@chemie.de
> 10117 Berlin, Germany                  Tel +49 30 204568-40
>                                       Fax +49 30 204568-70
> 
> www.Chemie.DE               |          www.ChemieKarriere.NET
> www.Bionity.COM             |          www.BioKarriere.NET
> 
>

Re: meta-tags not indexed?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Howie Wang wrote:
> 
>> as far as I can see, nutch does not index any html meta-tags like 
>> description or keywords. Does anybody know the reason for this?
> 
> 
> I'm not sure why Nutch doesn't do it, but a lot of search engines
> stopped using those for scoring because they were abused by
> spam sites that would stuff them with keywords.

Same reason - keywords and description meta-tags are rarely useful these 
days. But you may hope they are useful if you crawl .gov, .mil, and 
sometimes .edu domains.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


RE: meta-tags not indexed?

Posted by Howie Wang <ho...@hotmail.com>.
>as far as I can see, nutch does not index any html meta-tags like 
>description or keywords. Does anybody know the reason for this?

I'm not sure why Nutch doesn't do it, but a lot of search engines
stopped using those for scoring because they were abused by
spam sites that would stuff them with keywords.

If you really want it, it's not too difficult. Just copy the
index-basic plugin and add some code to index it:

    String desc = metadata.getProperty("description");
    String keywords = metadata.getProperty("keywords");

   doc.add(Field.Text("content", description));
   doc.add(Field.Text("content", keywords));

   // Or you could add your own fields, but you'll have to
   // change your query filters to pick them up:

   doc.add(Field.Text("description", description));
   doc.add(Field.Text("keywords", keywords));