You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2005/01/10 11:33:50 UTC
SYNONYM + GOOGLE
Hi Guys
Apologies........
Does Lucene have a Synonym Functonality as Google.
If u search Google using '~shoes', It returns hits based on the
Synonym's
[ I know there is a Synonym Wordnet based Lucene Package in the sandbox
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN
et/ ]
Can this be achieved in Lucene ,If so How ???
Thx in Advance
Karthik
WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Posted by Ian Soboroff <ia...@nist.gov>.
Daniel Naber <da...@t-online.de> writes:
> On Wednesday 12 January 2005 01:47, David Spencer wrote:
>
>> Amusingly then, documents with the terms "liberal wienerwurst" match
>> "big dog"! :)
>
> There's something like frequency information in WordNet, it could probably
> be used to ignore the uncommon meanings.
If you just go search CiteSeer for "WordNet", you will find the output
of every failed MS thesis experiment to improve retrieval performance
by naive application of WordNet synsets.
But I like the query expansion code.
Ian
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Posted by Daniel Naber <da...@t-online.de>.
On Wednesday 12 January 2005 01:47, David Spencer wrote:
> Amusingly then, documents with the terms "liberal wienerwurst" match
> "big dog"! :)
There's something like frequency information in WordNet, it could probably
be used to ignore the uncommon meanings.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: WordNet code updated, now with query expansion -- Re: SYNONYM
+ GOOGLE
Posted by Pierrick Brihaye <pi...@culture.gouv.fr>.
Hi,
David Spencer a écrit :
>> Do you plan to add expansion on other Wordnet relationships ?
>> Hypernyms and hyponyms would be a good start point for thesaurus-like
>> search, wouldn't it ?
>
> Good point, I hadn't considered this - but how would it work -just
> consider these 2 relationships "synonyms" (thus easier to use) or make
> it separate (too academic?)
Well... the ideal case would be (easy) customization :-), form an
external text (XML ?) file. Depending of the kind of relationship, the
boost factor could be adjusted when the query is expanded. The same on
relationships' depths.
For example a "father" hypernym could have a boost factor of 0.8, a
"grand-father" a boost factor of 0.4, a "grand-grand-father" a boost
factor of 0.2. Well, I wonder whether a logarithmic scale makes a better
sense than a linear scale, but this should/would be customizable...
>> However, I'm afraid that this kind of feature would require
>> refactoring, probably based on WordNet-dedicated libraries. JWNL
>> (http://jwordnet.sourceforge.net/) may be a good candidate for this.
>
> Good point, should leverage existing code.
One thing you can also easily get from this library are Wordnet's
"exceptions", often irregular plurals (mouse/mice, addendum/addenda...).
A very basic yet efficient kind of stemming which should be expanded
with the same boost factor than the original term.
Well, there are many other relationships in WordNet. Take a look at :
http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/
legends are here :
http://treebolic.sourceforge.net/en/browserwn.htm
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr
+33 (0)2 99 29 67 78
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: WordNet code updated, now with query expansion -- Re: SYNONYM
+ GOOGLE
Posted by David Spencer <da...@tropo.com>.
Pierrick Brihaye wrote:
> Hi,
>
> David Spencer a écrit :
>
>> One example of expansion with the synonym boost set to 0.9 is the
>> query "big dog" expands to:
>
>
> Interesting.
>
> Do you plan to add expansion on other Wordnet relationships ? Hypernyms
> and hyponyms would be a good start point for thesaurus-like search,
> wouldn't it ?
Good point, I hadn't considered this - but how would it work -just
consider these 2 relationships "synonyms" (thus easier to use) or make
it separate (too academic?)
>
> However, I'm afraid that this kind of feature would require refactoring,
> probably based on WordNet-dedicated libraries. JWNL
> (http://jwordnet.sourceforge.net/) may be a good candidate for this.
Good point, should leverage existing code.
>
> Thank you for your work.
thx,
Dave
>
> Cheers,
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: WordNet code updated, now with query expansion -- Re: SYNONYM
+ GOOGLE
Posted by Pierrick Brihaye <pi...@culture.gouv.fr>.
Hi,
David Spencer a écrit :
> One example of expansion with the synonym boost set to 0.9 is the query
> "big dog" expands to:
Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms
and hyponyms would be a good start point for thesaurus-like search,
wouldn't it ?
However, I'm afraid that this kind of feature would require refactoring,
probably based on WordNet-dedicated libraries. JWNL
(http://jwordnet.sourceforge.net/) may be a good candidate for this.
Thank you for your work.
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr
+33 (0)2 99 29 67 78
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:
>
> On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
>
>> Hi...I wrote the WordNet sandbox code - but I'm not sure if I
>> undertand this thread. Are we saying that it does not work w/ the new
>> WordNet data, or that code in Eric's book is better/more up to date etc?
>
>
> I have not tried the sandbox with any versions past WordNet 1.6.
> Karthik shows a Java API to it, which I have not used - only your code
> that parses the prolog files. So the book code explains exactly what is
> in the sandbox and describes WordNet 1.6 integration. Though WordNet
> has evolved.
>
>> If needed I can update the sandbox code..
>
>
> It'd be awesome to have current WordNet support - I haven't looked at
> what is involved in making it so.
I verified that the code works w/ the latest WordNet (2.0), and it does
so, no problem. The relevant data from WordNet has not changed so
there's no need to upgrade WordNet for this package at least.
I added "query expansion" which takes in a simple query string and for
every term adds their synonyms. There's an optional boost parameter to
be used to "penalize" synonyms if you want to use the heuristic that the
user probably knows the right word.
One example of expansion with the synonym boost set to 0.9 is the query
"big dog" expands to:
big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9
bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9
giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9
magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9
vainglorious^0.9 vauntingly^0.9
dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9
detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9
heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9
trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9
Amusingly then, documents with the terms "liberal wienerwurst" match
"big dog"! :)
Javadoc is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html
The new query expansion is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html
Want to try it out? This page *expands* a query and prints out the
result (but doesn't execute it yet).
http://www.searchmorph.com/kat/synonym.jsp?syn=big
CVS tree here:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/
If you just want to use a prebuild index it's here (1MB):
http://searchmorph.com/pub/syn_index.zip
The prebuilt jar file is here:
http://www.searchmorph.com/pub/lucene-wordnet-dev.jar
Redundant weblog entry here:
http://www.searchmorph.com/weblog/index.php?id=34
Hope y'all like it and someone finds it useful,
Dave
PS
Oh - it may need the 1.5 dev branch of Lucene to work - I'm not
positive but it I tried to remove deprecated warnings and doing so may
have tied it to the latest code...
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: SYNONYM + GOOGLE
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
> Hi...I wrote the WordNet sandbox code - but I'm not sure if I
> undertand this thread. Are we saying that it does not work w/ the new
> WordNet data, or that code in Eric's book is better/more up to date
> etc?
I have not tried the sandbox with any versions past WordNet 1.6.
Karthik shows a Java API to it, which I have not used - only your code
that parses the prolog files. So the book code explains exactly what
is in the sandbox and describes WordNet 1.6 integration. Though
WordNet has evolved.
> If needed I can update the sandbox code..
It'd be awesome to have current WordNet support - I haven't looked at
what is involved in making it so.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: SYNONYM + GOOGLE
Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:
> Karthik,
>
> Thanks for that info. I knew I was behind the times with WordNet using
> the sandbox code, but it was good enough for my purposes at the time.
> I will definitely try out the latest WordNet offerings in the future
Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand
this thread. Are we saying that it does not work w/ the new WordNet
data, or that code in Eric's book is better/more up to date etc?
If needed I can update the sandbox code..
thx,
Dave
> though.
>
> Erik
>
> On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:
>
>> Hi Erik
>>
>> Apologies.......
>>
>> I may be a little offline from this form,but I may help u for the next
>> version of Luncene In Action.
>>
>>
>> I Was working on Java WordNet Library , On fiddling with the API's,
>> found
>> something Interesting ,
>>
>> the code attached to this get's more Synonyms then the Wordnet's
>> Indexed
>> format avaliable from the LuceneinAction Zip File
>>
>>
>>
>> 1) It needs Wordnet2.0's Dictonery Installed
>>
>> 2) jwnl.jar from SourceForge
>>
>> [
>> http://sourceforge.net/project/showfiles.php?
>> group_id=33824&package_id=33975
>> &release_id=196864 ]
>>
>>
>> After sucess compilation
>>
>> Type for watch
>>
>> ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
>> "hunting_watch" OR "pendulum_watch" OR
>> "pocket_watch" OR "stem-winder" OR "wristwatch" OR
>> "wrist_watch"
>>
>> FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
>> "hunting watch" OR "pendulum watch" OR "pocket watch"
>>
>>
>> Check this Out,may be u will come up with Briliant Idea's
>>
>>
>>
>> with regards
>> Karthik
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Monday, January 10, 2005 5:19 PM
>> To: Lucene Users List
>> Subject: Re: SYNONYM + GOOGLE
>>
>>
>>
>> On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
>>
>>> If u search Google using '~shoes', It returns hits based on the
>>> Synonym's
>>>
>>> [ I know there is a Synonym Wordnet based Lucene Package in the
>>> sandbox
>>>
>>> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
>>> contributions/WordN
>>> et/ ]
>>>
>>> Can this be achieved in Lucene ,If so How ???
>>
>>
>> Yes, it can be achieved. Not quite synonyms, but various forms of the
>> same word can be found in this example, like this search for similar
>> (see the highlighted variations):
>>
>> http://www.lucenebook.com/search?query=similar
>>
>> This is accomplished using the Snowball stemmer filter found in the
>> sandbox. For synonyms, you have lots of options. In Lucene in Action
>> I demonstrate custom analyzers that inject synonyms using the WordNet
>> database (from the sandbox). From the source code distribution of LIA:
>>
>> % ant SynonymAnalyzerViewer
>> Buildfile: build.xml
>>
>> SynonymAnalyzerViewer:
>> [echo]
>> [echo] Using a custom SynonymAnalyzer, two fixed strings are
>> [echo] analyzed with the results displayed. Synonyms, from
>> the
>> [echo] WordNet database, are injected into the same
>> positions
>> [echo] as the original words.
>> [echo]
>> [echo] See the "Analysis" chapter for more on synonym
>> injection and
>> [echo] position increments. The "Tools and extensions"
>> chapter covers
>> [echo] the WordNet feature found in the Lucene sandbox.
>> [echo]
>> [input] Press return to continue...
>>
>> [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
>>
>> [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
>> [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
>> [agile]
>> [java] 2: [brown] [brownness] [brownish]
>> [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
>> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
>> [java] 4: [jumps]
>> [java] 5: [over] [o] [across]
>> [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
>> [java] 7: [dogs]
>>
>> ...
>>
>> The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
>> Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for
>> "jump" and "dog", but not the plural forms. Stemming would be a
>> necessary step in achieving full synonym look-up, though this would
>> need to be done carefully as the stem of a word is not necessarily a
>> real word itself - so you'd probably want to stem the synonym database
>> also to ensure accurate lookup.
>>
>> Also notice the semantically incorrect synonyms that appear for the
>> animal fox ("confuse", for example). Be careful! :)
>>
>> Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: SYNONYM + GOOGLE
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Karthik,
Thanks for that info. I knew I was behind the times with WordNet using
the sandbox code, but it was good enough for my purposes at the time.
I will definitely try out the latest WordNet offerings in the future
though.
Erik
On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:
> Hi Erik
>
> Apologies.......
>
> I may be a little offline from this form,but I may help u for the next
> version of Luncene In Action.
>
>
> I Was working on Java WordNet Library , On fiddling with the API's,
> found
> something Interesting ,
>
> the code attached to this get's more Synonyms then the Wordnet's
> Indexed
> format avaliable from the LuceneinAction Zip File
>
>
>
> 1) It needs Wordnet2.0's Dictonery Installed
>
> 2) jwnl.jar from SourceForge
>
> [
> http://sourceforge.net/project/showfiles.php?
> group_id=33824&package_id=33975
> &release_id=196864 ]
>
>
> After sucess compilation
>
> Type for watch
>
> ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
> "hunting_watch" OR "pendulum_watch" OR
> "pocket_watch" OR "stem-winder" OR "wristwatch" OR
> "wrist_watch"
>
> FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
> "hunting watch" OR "pendulum watch" OR "pocket watch"
>
>
> Check this Out,may be u will come up with Briliant Idea's
>
>
>
> with regards
> Karthik
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Monday, January 10, 2005 5:19 PM
> To: Lucene Users List
> Subject: Re: SYNONYM + GOOGLE
>
>
>
> On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
>> If u search Google using '~shoes', It returns hits based on the
>> Synonym's
>>
>> [ I know there is a Synonym Wordnet based Lucene Package in the
>> sandbox
>>
>> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
>> contributions/WordN
>> et/ ]
>>
>> Can this be achieved in Lucene ,If so How ???
>
> Yes, it can be achieved. Not quite synonyms, but various forms of the
> same word can be found in this example, like this search for similar
> (see the highlighted variations):
>
> http://www.lucenebook.com/search?query=similar
>
> This is accomplished using the Snowball stemmer filter found in the
> sandbox. For synonyms, you have lots of options. In Lucene in Action
> I demonstrate custom analyzers that inject synonyms using the WordNet
> database (from the sandbox). From the source code distribution of LIA:
>
> % ant SynonymAnalyzerViewer
> Buildfile: build.xml
>
> SynonymAnalyzerViewer:
> [echo]
> [echo] Using a custom SynonymAnalyzer, two fixed strings
> are
> [echo] analyzed with the results displayed. Synonyms, from
> the
> [echo] WordNet database, are injected into the same
> positions
> [echo] as the original words.
> [echo]
> [echo] See the "Analysis" chapter for more on synonym
> injection and
> [echo] position increments. The "Tools and extensions"
> chapter covers
> [echo] the WordNet feature found in the Lucene sandbox.
> [echo]
> [input] Press return to continue...
>
> [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
>
> [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
> [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
> [agile]
> [java] 2: [brown] [brownness] [brownish]
> [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob]
> [dodger]
> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
> [java] 4: [jumps]
> [java] 5: [over] [o] [across]
> [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
> [java] 7: [dogs]
>
> ...
>
> The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
> Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for
> "jump" and "dog", but not the plural forms. Stemming would be a
> necessary step in achieving full synonym look-up, though this would
> need to be done carefully as the stem of a word is not necessarily a
> real word itself - so you'd probably want to stem the synonym database
> also to ensure accurate lookup.
>
> Also notice the semantically incorrect synonyms that appear for the
> animal fox ("confuse", for example). Be careful! :)
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: SYNONYM + GOOGLE
Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Erik
Apologies.......
I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.
I Was working on Java WordNet Library , On fiddling with the API's, found
something Interesting ,
the code attached to this get's more Synonyms then the Wordnet's Indexed
format avaliable from the LuceneinAction Zip File
1) It needs Wordnet2.0's Dictonery Installed
2) jwnl.jar from SourceForge
[
http://sourceforge.net/project/showfiles.php?group_id=33824&package_id=33975
&release_id=196864 ]
After sucess compilation
Type for watch
ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
"hunting_watch" OR "pendulum_watch" OR
"pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch"
FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
"hunting watch" OR "pendulum watch" OR "pocket watch"
Check this Out,may be u will come up with Briliant Idea's
with regards
Karthik
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE
On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
> If u search Google using '~shoes', It returns hits based on the
> Synonym's
>
> [ I know there is a Synonym Wordnet based Lucene Package in the
> sandbox
>
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> contributions/WordN
> et/ ]
>
> Can this be achieved in Lucene ,If so How ???
Yes, it can be achieved. Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):
http://www.lucenebook.com/search?query=similar
This is accomplished using the Snowball stemmer filter found in the
sandbox. For synonyms, you have lots of options. In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox). From the source code distribution of LIA:
% ant SynonymAnalyzerViewer
Buildfile: build.xml
SynonymAnalyzerViewer:
[echo]
[echo] Using a custom SynonymAnalyzer, two fixed strings are
[echo] analyzed with the results displayed. Synonyms, from
the
[echo] WordNet database, are injected into the same positions
[echo] as the original words.
[echo]
[echo] See the "Analysis" chapter for more on synonym
injection and
[echo] position increments. The "Tools and extensions"
chapter covers
[echo] the WordNet feature found in the Lucene sandbox.
[echo]
[input] Press return to continue...
[echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
[java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
[java] 2: [brown] [brownness] [brownish]
[java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
[java] 4: [jumps]
[java] 5: [over] [o] [across]
[java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
[java] 7: [dogs]
...
The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for
"jump" and "dog", but not the plural forms. Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.
Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example). Be careful! :)
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: SYNONYM + GOOGLE
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
> If u search Google using '~shoes', It returns hits based on the
> Synonym's
>
> [ I know there is a Synonym Wordnet based Lucene Package in the
> sandbox
>
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> contributions/WordN
> et/ ]
>
> Can this be achieved in Lucene ,If so How ???
Yes, it can be achieved. Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):
http://www.lucenebook.com/search?query=similar
This is accomplished using the Snowball stemmer filter found in the
sandbox. For synonyms, you have lots of options. In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox). From the source code distribution of LIA:
% ant SynonymAnalyzerViewer
Buildfile: build.xml
SynonymAnalyzerViewer:
[echo]
[echo] Using a custom SynonymAnalyzer, two fixed strings are
[echo] analyzed with the results displayed. Synonyms, from
the
[echo] WordNet database, are injected into the same positions
[echo] as the original words.
[echo]
[echo] See the "Analysis" chapter for more on synonym
injection and
[echo] position increments. The "Tools and extensions"
chapter covers
[echo] the WordNet feature found in the Lucene sandbox.
[echo]
[input] Press return to continue...
[echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
[java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
[java] 2: [brown] [brownness] [brownish]
[java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
[java] 4: [jumps]
[java] 5: [over] [o] [across]
[java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
[java] 7: [dogs]
...
The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for
"jump" and "dog", but not the plural forms. Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.
Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example). Be careful! :)
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org