You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Martin O'Shea <ap...@dsl.pipex.com> on 2011/03/05 20:06:16 UTC
Combining analyzers in Lucene
Hello
I have a situation where I'm using two methods in a Java class to implement
a StandardAnalyzer in Lucene to index text strings and return their word
frequencies as follows:
public void indexText(String suffix, boolean includeStopWords) {
StandardAnalyzer analyzer = null;
if (includeStopWords) {
analyzer = new StandardAnalyzer(Version.LUCENE_30);
}
else {
// Get Stop_Words to exclude them.
Set<String> stopWords = (Set<String>)
Stop_Word_Listener.getStopWords();
analyzer = new StandardAnalyzer(Version.LUCENE_30, stopWords);
}
try {
// Index text.
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
this.addTextToIndex(w, this.getTextToIndex());
w.close();
// Read index.
IndexReader ir = IndexReader.open(index);
Text_TermVectorMapper ttvm = new Text_TermVectorMapper();
int docId = 0;
ir.getTermFreqVector(docId, "text", ttvm);
// Set output.
this.setWordFrequencies(ttvm.getWordFrequencies());
w.close();
}
catch(Exception ex) {
logger.error("Error indexing elements of RSS_Feed for object " +
suffix + "\n", ex);
}
}
private void addTextToIndex(IndexWriter w, String value) throws
IOException {
Document doc = new Document();
doc.add(new Field("text"), value, Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.YES));
w.addDocument(doc);
}
Which works perfectly well but I would like to combine this with stemming
using a SnowballAnalyzer as well.
This class also has two instance variables shown in a constructor below:
public Text_Indexer(String textToIndex) {
this.textToIndex = textToIndex;
this.wordFrequencies = new HashMap<String, Integer>();
}
Can anyone tell me how best to achieve this with the code above? Should I
re-index the text when it is returned by the above code or can the stemming
be introduced into the above at all?
Thanks
Mr Morgan.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org