You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Kristof Kessler <kr...@googlemail.com> on 2012/05/10 07:54:30 UTC

Date format issue Nutch-Solr with NUTCH-809 Parse-metatags plugin

Hello,

I have been working with the plugin for some time and works for
everything (approx. 100 metadata fields) I need to extract from a set
of webpages. I am mapping these fields to Solr and only have a problem
when it comes to fields which I want to convert to a format other than
string. I have several date fields which are formatted as yyyy-mm-dd
and no matter which way I try, I do not get it to end up as Solr date
field as this requires the data in the format yyyy-mm-ddThh:mm:ssZ.
Simply declaring the field as date in the schema results in an error.
I have no control over the format in which the dates are stored in the
webpages and nothing I tried in Solr works, so my only remaining guess
is that I need to look into changing the format within the indexing
process in Nutch. Any hint how to do that?

In addition to this, I have been trying to find the source file for
MetaTagsIndexer.class to adjust it, but I can only seem to find
MetaTagsParser.java in Revision 1303371 and any other way I try.
Would be great if someone could send me the MetaTagsIndexer,java file.

Thanks!

Kristof