You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chee wu <ch...@gmail.com> on 2007/01/13 17:21:49 UTC

Crawling but no indexing..

Hi,
I know this kind of question has been discussed before, for example msg04098. I have found two possible solutions.Could you help me to confirm whether  my solutions are possible.
1. Write an indexingFilter set the return doc  to  null 
   -Can this cause exceptions like NullPointerException etc during the process below ?

2. In indexer.java, just skip  "writer.addDocument(doc.analyzer)"
  public void write(WritableComparable key, Writable value)
            throws IOException {                  // unwrap & index doc
            Document doc = (Document)((ObjectWritable)value).get();
            //NutchAnalyzer analyzer = factory.get(doc.get("lang"));
            NutchAnalyzer analyzer = factory.get("zh");
            if (LOG.isInfoEnabled()) {
              LOG.info(" Indexing [" + doc.getField("url").stringValue() + "]" +
                       " with analyzer " + analyzer +
                       " (" + doc.get("lang") + ")");
            }
            writer.addDocument(doc, analyzer);
  }

Are these two solutions OK to us ? Or any other better solution for this problem ?