You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hans Benedict <be...@chemie.de> on 2005/06/17 08:50:56 UTC
meta-tags not indexed?
Hi,
as far as I can see, nutch does not index any html meta-tags like
description or keywords. Does anybody know the reason for this?
--
Regards,
Hans Benedict
_________________________________________________________________
Chemie.DE Information Service GmbH Hans Benedict
Seydelstraße 28 mailto: benedict@chemie.de
10117 Berlin, Germany Tel +49 30 204568-40
Fax +49 30 204568-70
www.Chemie.DE | www.ChemieKarriere.NET
www.Bionity.COM | www.BioKarriere.NET
pointers regarding searching
Posted by Emilijan Mirceski <em...@cpuedge.com>.
Hi,
What do you guys usually use as a development?
Can anyone recommend me a tomcat + ant hosting company?
I'm looking for up to 600 MB space, preferably in the cheap range.
Thanks,
Emilijan
Re: meta-tags not indexed?
Posted by Jack Tang <hi...@gmail.com>.
http://issues.apache.org/jira/browse/NUTCH-62?page=all
/Jack
On 6/17/05, Hans Benedict <be...@chemie.de> wrote:
> Hi,
>
> as far as I can see, nutch does not index any html meta-tags like
> description or keywords. Does anybody know the reason for this?
>
> --
>
> Regards,
> Hans Benedict
>
> _________________________________________________________________
> Chemie.DE Information Service GmbH Hans Benedict
> Seydelstraße 28 mailto: benedict@chemie.de
> 10117 Berlin, Germany Tel +49 30 204568-40
> Fax +49 30 204568-70
>
> www.Chemie.DE | www.ChemieKarriere.NET
> www.Bionity.COM | www.BioKarriere.NET
>
>
Re: meta-tags not indexed?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Howie Wang wrote:
>
>> as far as I can see, nutch does not index any html meta-tags like
>> description or keywords. Does anybody know the reason for this?
>
>
> I'm not sure why Nutch doesn't do it, but a lot of search engines
> stopped using those for scoring because they were abused by
> spam sites that would stuff them with keywords.
Same reason - keywords and description meta-tags are rarely useful these
days. But you may hope they are useful if you crawl .gov, .mil, and
sometimes .edu domains.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
RE: meta-tags not indexed?
Posted by Howie Wang <ho...@hotmail.com>.
>as far as I can see, nutch does not index any html meta-tags like
>description or keywords. Does anybody know the reason for this?
I'm not sure why Nutch doesn't do it, but a lot of search engines
stopped using those for scoring because they were abused by
spam sites that would stuff them with keywords.
If you really want it, it's not too difficult. Just copy the
index-basic plugin and add some code to index it:
String desc = metadata.getProperty("description");
String keywords = metadata.getProperty("keywords");
doc.add(Field.Text("content", description));
doc.add(Field.Text("content", keywords));
// Or you could add your own fields, but you'll have to
// change your query filters to pick them up:
doc.add(Field.Text("description", description));
doc.add(Field.Text("keywords", keywords));