You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Aaron Binns <aa...@archive.org> on 2009/06/18 23:28:10 UTC

Language plugin tokenizers in Indexer?

I've been working on bringing the NutchWAX project in line with the
Nutch 1.0 release.

One of the Nutch 1.0 features I'm interested in using is the language
analysis plugin so that I can start playing with tokenizers for Chinese,
Japanese, etc.

After looking at Indexer.java and SolrIndexer.java, I couldn't see how
the language plugins are used.  I did see their use in the "new" scoring
and indexing stuff: FieldIndexer.java and related classes.

Is the use of the language-specific tokenizer plugins only used by the
new FieldIndexer system?  Or is it also used by the traditional Lucene
indexer and I just overlooked it?


Thanks!

Aaron

-- 
Aaron Binns
Senior Software Engineer, Web Group
Internet Archive
aaron@archive.org