You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Matthias W." <Ma...@e-projecta.com> on 2008/09/11 10:53:23 UTC

Edit index structure

Hi,
is it possible to edit the index structure of nutch?

I have following problem:
The files will be indexed by Nutch, the frontend will be implemented with
Zend Framework 1.6.0 (Zend_Search_Lucene).
Zend_Search_Lucene IMO doesn't support the nutch index structure, so I can
only read the title, url, digest-code, tstamp, and score from the nutch
index but I'm not able to read the digest itself or other fields.
Can I change the fields to be stored in the index? where?
Or are there other possibilities to solve this problem?

I've got an additional question concerning nutch (version 0.9):
Does nutch check the MIME-Type of files before indexing or check it only the
extension of the files to get the matching parser?
-- 
View this message in context: http://www.nabble.com/Edit-index-structure-tp19430556p19430556.html
Sent from the Nutch - User mailing list archive at Nabble.com.