You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/12/08 22:19:46 UTC
bug in obtaining 'tstamp' field for 2.x BasicIndexingFilter
Hi,
After running with Sebastian's patch for NUTCH-1038, I get the
following results
lewismc@lewismc-HP-Mini-110-3100:~/ASF/2.x/runtime/local$ ./bin/nutch
indexchecker http://apache.org
content : Welcome to The Apache Software Foundation! The Apache
Software Foundation Community-led development
title : Welcome to The Apache Software Foundation!
host : apache.org
tstamp : 1970-01-01T00:00:00.000Z
url : http://apache.org
Looking @ the BasicIndexingFilter code for adding the tstamp field I see
// add timestamp when fetched, for deduplication
String tstamp = DateUtil.getThreadLocalDateFormat().format(new
Date(page.getFetchTime()));
doc.add("tstamp", tstamp);
DateUtil belongs to Solr [0].
I have not debugged this yet, I just wanted to put it into the open to
see if anyone else had experienced irregularities with
BasicIndexingFilter?
Best
Lewis
[0] http://s.apache.org/dnA
--
Lewis