You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Abhishek Shivkumar <as...@in.ibm.com> on 2013/04/18 14:34:00 UTC

Why doesn't this code run - Adding synonyms from Wordnet to Lucene Index

I am writing this code as part of my CustomAnalyzer:

    public class CustomAnalyzer extends Analyzer {
 
    SynonymMap mySynonymMap = null;
 
    CustomAnalyzer() throws IOException {
        SynonymMap.Builder builder = new SynonymMap.Builder(true);
 
        FileReader fr = new 
FileReader("/home/watsonuser/Downloads/wordnetSynonyms.txt");
        BufferedReader br = new BufferedReader(fr);
        String line = "";
 
        while ((line = br.readLine()) != null) {
          String[] synset = line.split(",");
          for(String syn: synset)
              builder.add(new CharsRef(synset[0]), new CharsRef(syn), 
true);
        }
 
        br.close();
        fr.close();
 
        try {
            mySynonymMap = builder.build();
        } catch (IOException e) {
            System.out.println("Unable to build synonymMap");
            e.printStackTrace();
        }
    }
 
    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new PorterStemFilter(new SynonymFilter(
                                                  (new StopFilter(true,new 
LowerCaseFilter
                                                   (new StandardFilter(new 
StandardTokenizer
 (Version.LUCENE_36,reader)
                                                    )
 ),StopAnalyzer.ENGLISH_STOP_WORDS_SET)), mySynonymMap, true)
                                                   );
 
 
    }
    }

Now, if I use the same CustomAnalyzer as part of my querying, then if I 
enter the query as

    myFieldName: manager

it expands the query with synonyms for manager.

But, I want the synonyms to be part of only my index and I don't want my 
query to be expanded with synonyms. 

So, when I removed the SynonymFilter from my CustomAnalyzer only when 
querying the index, the query remains as

    myFieldName: manager

but, it fails to retrieve documents that have the synonyms of manager.

How do we solve this problem?

Thanks
Abhishek S