You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2005/05/12 05:24:45 UTC

DO NOT REPLY [Bug 34882] - Contrib: Main memory based SynonymMap and SynonymTokenFilter

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34882>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34882





------- Additional Comments From whoschek@lbl.gov  2005-05-12 05:24 -------
Created an attachment (id=15002)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=15002&action=view)
SynonymMap.java

/**
 * Loads the <a target="_blank" 
 * href="http://www.cogsci.princeton.edu/~wn/">WordNet </a> prolog file <a
 * href="http://www.cogsci.princeton.edu/2.0/WNprolog-2.0.tar.gz">wn_s.pl </a>
 * into a thread-safe main-memory hash map that can be used for fast
 * high-frequncy lookups of synonyms for any given (lowercase) word string.
 * <p>
 * There holds: If B is a synonym for A (A -> B) then A is also a synonym for B
(B -> A).
 * There does not necessary hold: A -> B, B -> C then A -> C.
 * <p>
 * Loading typically takes some 1.5 secs, so should be done only once per
 * (server) program execution, using a singleton pattern. Once loaded, a
 * synonym lookup via {@link #getSynonyms(String)}takes constant time O(1).
 * A loaded default synonym map consumes about 10 MB main memory.
 * An instance is immutable, hence thread-safe.
 * <p>
 * This implementation borrows some ideas from the
 * Lucene Syns2Index demo that Dave Spencer
 * dave&#064;searchmorph.com originally contributed to Lucene. Dave's approach
 * involved a persistent Lucene index which is suitable for occasional
 * lookups or very large synonym tables, but considered unsuitable for 
 * high-frequency lookups of medium size synonym tables.
 * <p>
 * Example Usage:
 * <pre>
 * String[] words = new String[] { "hard", "woods", "forest", "wolfish",
"xxxx"};
 * SynonymMap map = SynonymMap(new FileInputStream("samples/data/wn_s.pl"));
 * for (int i = 0; i &lt; words.length; i++) {
 *     String[] synonyms = map.getSynonyms(words[i]);
 *     System.out.println(words[i] + ":" +
java.util.Arrays.asList(synonyms).toString());
 * }
 * 
 * Example output:
 * hard:[arduous, backbreaking, difficult, fermented, firmly, grueling,
gruelling, heavily, heavy, intemperately, knockout, laborious, punishing,
severe, severely, strong, toilsome, tough]
 * woods:[forest, wood]
 * forest:[afforest, timber, timberland, wood, woodland, woods]
 * wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious,
wolflike]
 * xxxx:[]
 * </pre>
 * 
 * @author whoschek.AT.lbl.DOT.gov
 * @see <a target="_blank"
 *     
href="http://www.cogsci.princeton.edu/~wn/man/prologdb.5WN.html">prologdb
 *	man page </a>
 * @see <a target="_blank"
href="http://www.hostmon.com/rfc/advanced.jsp">Dave's synonym demo site</a>
 */


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org