You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/12/08 22:19:46 UTC

bug in obtaining 'tstamp' field for 2.x BasicIndexingFilter

Hi,

After running with Sebastian's patch for NUTCH-1038, I get the
following results

lewismc@lewismc-HP-Mini-110-3100:~/ASF/2.x/runtime/local$ ./bin/nutch
indexchecker http://apache.org
content :	Welcome to The Apache Software Foundation! The Apache
Software Foundation Community-led development
title :	Welcome to The Apache Software Foundation!
host :	apache.org
tstamp :	1970-01-01T00:00:00.000Z
url :	http://apache.org

Looking @ the BasicIndexingFilter code for adding the tstamp field I see

    // add timestamp when fetched, for deduplication
    String tstamp = DateUtil.getThreadLocalDateFormat().format(new
Date(page.getFetchTime()));
    doc.add("tstamp", tstamp);

DateUtil belongs to Solr [0].

I have not debugged this yet, I just wanted to put it into the open to
see if anyone else had experienced irregularities with
BasicIndexingFilter?

Best

Lewis

[0] http://s.apache.org/dnA

-- 
Lewis