You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2005/05/12 05:24:45 UTC
DO NOT REPLY [Bug 34882] -
Contrib: Main memory based SynonymMap and SynonymTokenFilter
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34882>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=34882
------- Additional Comments From whoschek@lbl.gov 2005-05-12 05:24 -------
Created an attachment (id=15002)
--> (http://issues.apache.org/bugzilla/attachment.cgi?id=15002&action=view)
SynonymMap.java
/**
* Loads the <a target="_blank"
* href="http://www.cogsci.princeton.edu/~wn/">WordNet </a> prolog file <a
* href="http://www.cogsci.princeton.edu/2.0/WNprolog-2.0.tar.gz">wn_s.pl </a>
* into a thread-safe main-memory hash map that can be used for fast
* high-frequncy lookups of synonyms for any given (lowercase) word string.
* <p>
* There holds: If B is a synonym for A (A -> B) then A is also a synonym for B
(B -> A).
* There does not necessary hold: A -> B, B -> C then A -> C.
* <p>
* Loading typically takes some 1.5 secs, so should be done only once per
* (server) program execution, using a singleton pattern. Once loaded, a
* synonym lookup via {@link #getSynonyms(String)}takes constant time O(1).
* A loaded default synonym map consumes about 10 MB main memory.
* An instance is immutable, hence thread-safe.
* <p>
* This implementation borrows some ideas from the
* Lucene Syns2Index demo that Dave Spencer
* dave@searchmorph.com originally contributed to Lucene. Dave's approach
* involved a persistent Lucene index which is suitable for occasional
* lookups or very large synonym tables, but considered unsuitable for
* high-frequency lookups of medium size synonym tables.
* <p>
* Example Usage:
* <pre>
* String[] words = new String[] { "hard", "woods", "forest", "wolfish",
"xxxx"};
* SynonymMap map = SynonymMap(new FileInputStream("samples/data/wn_s.pl"));
* for (int i = 0; i < words.length; i++) {
* String[] synonyms = map.getSynonyms(words[i]);
* System.out.println(words[i] + ":" +
java.util.Arrays.asList(synonyms).toString());
* }
*
* Example output:
* hard:[arduous, backbreaking, difficult, fermented, firmly, grueling,
gruelling, heavily, heavy, intemperately, knockout, laborious, punishing,
severe, severely, strong, toilsome, tough]
* woods:[forest, wood]
* forest:[afforest, timber, timberland, wood, woodland, woods]
* wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious,
wolflike]
* xxxx:[]
* </pre>
*
* @author whoschek.AT.lbl.DOT.gov
* @see <a target="_blank"
*
href="http://www.cogsci.princeton.edu/~wn/man/prologdb.5WN.html">prologdb
* man page </a>
* @see <a target="_blank"
href="http://www.hostmon.com/rfc/advanced.jsp">Dave's synonym demo site</a>
*/
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org