You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2005/01/10 11:33:50 UTC

SYNONYM + GOOGLE


Hi Guys

Apologies........

Does Lucene have a  Synonym  Functonality as Google.

If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's

[ I know there is a Synonym Wordnet  based Lucene Package in the sandbox

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN
et/   ]

Can this be achieved in Lucene ,If so How ???



Thx in Advance
Karthik






















WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by Ian Soboroff <ia...@nist.gov>.
Daniel Naber <da...@t-online.de> writes:

> On Wednesday 12 January 2005 01:47, David Spencer wrote:
>
>> Amusingly then, documents with the terms "liberal wienerwurst" match
>> "big dog"! :)
>
> There's something like frequency information in WordNet, it could probably 
> be used to ignore the uncommon meanings.

If you just go search CiteSeer for "WordNet", you will find the output
of every failed MS thesis experiment to improve retrieval performance
by naive application of WordNet synsets.

But I like the query expansion code.

Ian



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by Daniel Naber <da...@t-online.de>.
On Wednesday 12 January 2005 01:47, David Spencer wrote:

> Amusingly then, documents with the terms "liberal wienerwurst" match
> "big dog"! :)

There's something like frequency information in WordNet, it could probably 
be used to ignore the uncommon meanings.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by Pierrick Brihaye <pi...@culture.gouv.fr>.
Hi,

David Spencer a écrit :

>> Do you plan to add expansion on other Wordnet relationships ? 
>> Hypernyms and hyponyms would be a good start point for thesaurus-like 
>> search, wouldn't it ?
> 
> Good point, I hadn't considered this - but how would it work -just 
> consider these 2 relationships "synonyms" (thus easier to use) or make 
> it separate (too academic?)

Well... the ideal case would be (easy) customization :-), form an 
external text (XML ?) file. Depending of the kind of relationship, the 
boost factor could be adjusted when the query is expanded. The same on 
relationships' depths.

For example a "father" hypernym could have a boost factor of 0.8, a 
"grand-father" a boost factor of 0.4, a "grand-grand-father" a boost 
factor of 0.2. Well, I wonder whether a logarithmic scale makes a better 
sense than a linear scale, but this should/would be customizable...

>> However, I'm afraid that this kind of feature would require 
>> refactoring, probably based on WordNet-dedicated libraries. JWNL 
>> (http://jwordnet.sourceforge.net/) may be a good candidate for this.
> 
> Good point, should leverage existing code.

One thing you can also easily get from this library are Wordnet's 
"exceptions", often irregular plurals (mouse/mice, addendum/addenda...). 
A very basic yet efficient kind of stemming which should be expanded 
with the same boost factor than the original term.

Well, there are many other relationships in WordNet. Take a look at :

http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/
legends are here :
http://treebolic.sourceforge.net/en/browserwn.htm

Cheers,

-- 
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr
+33 (0)2 99 29 67 78

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by David Spencer <da...@tropo.com>.
Pierrick Brihaye wrote:

> Hi,
> 
> David Spencer a écrit :
> 
>> One example of expansion with the synonym boost set to 0.9 is the 
>> query "big dog" expands to:
> 
> 
> Interesting.
> 
> Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
> and hyponyms would be a good start point for thesaurus-like search, 
> wouldn't it ?

Good point, I hadn't considered this - but how would it work -just 
consider these 2 relationships "synonyms" (thus easier to use) or make 
it separate (too academic?)
> 
> However, I'm afraid that this kind of feature would require refactoring, 
> probably based on WordNet-dedicated libraries. JWNL 
> (http://jwordnet.sourceforge.net/) may be a good candidate for this.

Good point, should leverage existing code.


> 
> Thank you for your work.

thx,
  Dave

> 
> Cheers,
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by Pierrick Brihaye <pi...@culture.gouv.fr>.
Hi,

David Spencer a écrit :

> One example of expansion with the synonym boost set to 0.9 is the query 
> "big dog" expands to:

Interesting.

Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
and hyponyms would be a good start point for thesaurus-like search, 
wouldn't it ?

However, I'm afraid that this kind of feature would require refactoring, 
probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.

Thank you for your work.

Cheers,

-- 
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr
+33 (0)2 99 29 67 78

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:

> 
> On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
> 
>> Hi...I wrote the WordNet sandbox code - but I'm not sure if I 
>> undertand this thread. Are we saying that it does not work w/ the new 
>> WordNet data, or that code in Eric's book is better/more up to date etc?
> 
> 
> I have not tried the sandbox with any versions past WordNet 1.6.  
> Karthik shows a Java API to it, which I have not used - only your code 
> that parses the prolog files.  So the book code explains exactly what is 
> in the sandbox and describes WordNet 1.6 integration.  Though WordNet 
> has evolved.
> 
>> If needed I can update the sandbox code..
> 
> 
> It'd be awesome to have current WordNet support - I haven't looked at 
> what is involved in making it so.


I verified that the code works w/ the latest WordNet (2.0), and it does 
so, no problem. The relevant data from WordNet has not changed so 
there's no need to upgrade WordNet for this package at least.

I added "query expansion" which takes in a simple query string and for 
every term adds their synonyms. There's an optional boost parameter to 
be used to "penalize" synonyms if you want to use the heuristic that the 
  user probably knows the right word.

One example of expansion with the synonym boost set to 0.9 is the query 
"big dog" expands to:

big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 
bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 
giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 
magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 
vainglorious^0.9 vauntingly^0.9
  dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 
detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 
heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 
trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9

Amusingly then, documents with the terms "liberal wienerwurst" match 
"big dog"! :)

Javadoc is here:

http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html

The new query expansion is here:

http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html


Want to try it out? This page *expands* a query and prints out the 
result (but doesn't execute it yet).
http://www.searchmorph.com/kat/synonym.jsp?syn=big

CVS tree here:

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/

If you just want to use a prebuild index it's here (1MB):
http://searchmorph.com/pub/syn_index.zip

The prebuilt jar file is here:

http://www.searchmorph.com/pub/lucene-wordnet-dev.jar


Redundant weblog entry here:

http://www.searchmorph.com/weblog/index.php?id=34

Hope y'all like it and someone finds it useful,
   Dave

PS
  Oh - it may need the 1.5 dev branch of Lucene to work - I'm not 
positive but it I tried to remove deprecated warnings and doing so may 
have tied it to the latest code...

> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SYNONYM + GOOGLE

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
> Hi...I wrote the WordNet sandbox code - but I'm not sure if I 
> undertand this thread. Are we saying that it does not work w/ the new 
> WordNet data, or that code in Eric's book is better/more up to date 
> etc?

I have not tried the sandbox with any versions past WordNet 1.6.  
Karthik shows a Java API to it, which I have not used - only your code 
that parses the prolog files.  So the book code explains exactly what 
is in the sandbox and describes WordNet 1.6 integration.  Though 
WordNet has evolved.

> If needed I can update the sandbox code..

It'd be awesome to have current WordNet support - I haven't looked at 
what is involved in making it so.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SYNONYM + GOOGLE

Posted by David Spencer <da...@tropo.com>.
Erik Hatcher wrote:

> Karthik,
> 
> Thanks for that info.  I knew I was behind the times with WordNet using  
> the sandbox code, but it was good enough for my purposes at the time.   
> I will definitely try out the latest WordNet offerings in the future  

Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand 
this thread. Are we saying that it does not work w/ the new WordNet 
data, or that code in Eric's book is better/more up to date etc?

If needed I can update the sandbox code..

thx,
  Dave


> though.
> 
>     Erik
> 
> On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:
> 
>> Hi Erik
>>
>> Apologies.......
>>
>> I may be a little offline from this form,but I may help u for the next
>> version of Luncene In Action.
>>
>>
>>  I Was working on Java WordNet Library , On fiddling with the API's,  
>> found
>> something Interesting ,
>>
>>  the code attached to this  get's more Synonyms then the Wordnet's  
>> Indexed
>> format avaliable from the LuceneinAction Zip File
>>
>>
>>
>> 1) It needs Wordnet2.0's Dictonery  Installed
>>
>> 2) jwnl.jar from SourceForge
>>
>> [
>> http://sourceforge.net/project/showfiles.php? 
>> group_id=33824&package_id=33975
>> &release_id=196864 ]
>>
>>
>> After sucess compilation
>>
>> Type for watch
>>
>> ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
>> "hunting_watch" OR "pendulum_watch" OR
>>             "pocket_watch" OR "stem-winder" OR "wristwatch" OR  
>> "wrist_watch"
>>
>> FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
>> "hunting watch" OR "pendulum watch" OR "pocket watch"
>>
>>
>> Check this Out,may be u will come up with Briliant Idea's
>>
>>
>>
>> with regards
>> Karthik
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Monday, January 10, 2005 5:19 PM
>> To: Lucene Users List
>> Subject: Re: SYNONYM + GOOGLE
>>
>>
>>
>> On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
>>
>>> If u search Google  using  '~shoes',  It returns  hits  based on the
>>> Synonym's
>>>
>>> [ I know there is a Synonym Wordnet  based Lucene Package in the
>>> sandbox
>>>
>>> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
>>> contributions/WordN
>>> et/   ]
>>>
>>> Can this be achieved in Lucene ,If so How ???
>>
>>
>> Yes, it can be achieved.  Not quite synonyms, but various forms of the
>> same word can be found in this example, like this search for similar
>> (see the highlighted variations):
>>
>>     http://www.lucenebook.com/search?query=similar
>>
>> This is accomplished using the Snowball stemmer filter found in the
>> sandbox.   For synonyms, you have lots of options.  In Lucene in Action
>> I demonstrate custom analyzers that inject synonyms using the WordNet
>> database (from the sandbox).  From the source code distribution of LIA:
>>
>> % ant SynonymAnalyzerViewer
>> Buildfile: build.xml
>>
>> SynonymAnalyzerViewer:
>>       [echo]
>>       [echo]       Using a custom SynonymAnalyzer, two fixed strings  are
>>       [echo]       analyzed with the results displayed.  Synonyms, from
>> the
>>       [echo]       WordNet database, are injected into the same  
>> positions
>>       [echo]       as the original words.
>>       [echo]
>>       [echo]       See the "Analysis" chapter for more on synonym
>> injection and
>>       [echo]       position increments.  The "Tools and extensions"
>> chapter covers
>>       [echo]       the WordNet feature found in the Lucene sandbox.
>>       [echo]
>>      [input] Press return to continue...
>>
>>       [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
>>
>>       [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
>> [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
>> [agile]
>>       [java] 2: [brown] [brownness] [brownish]
>>       [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob]  [dodger]
>> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
>>       [java] 4: [jumps]
>>       [java] 5: [over] [o] [across]
>>       [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
>>       [java] 7: [dogs]
>>
>> ...
>>
>> The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
>>   Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
>> "jump" and "dog", but not the plural forms.  Stemming would be a
>> necessary step in achieving full synonym look-up, though this would
>> need to be done carefully as the stem of a word is not necessarily a
>> real word itself - so you'd probably want to stem the synonym database
>> also to ensure accurate lookup.
>>
>> Also notice the semantically incorrect synonyms that appear for the
>> animal fox ("confuse", for example).  Be careful!  :)
>>
>>     Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SYNONYM + GOOGLE

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Karthik,

Thanks for that info.  I knew I was behind the times with WordNet using  
the sandbox code, but it was good enough for my purposes at the time.   
I will definitely try out the latest WordNet offerings in the future  
though.

	Erik

On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:

> Hi Erik
>
> Apologies.......
>
> I may be a little offline from this form,but I may help u for the next
> version of Luncene In Action.
>
>
>  I Was working on Java WordNet Library , On fiddling with the API's,  
> found
> something Interesting ,
>
>  the code attached to this  get's more Synonyms then the Wordnet's  
> Indexed
> format avaliable from the LuceneinAction Zip File
>
>
>
> 1) It needs Wordnet2.0's Dictonery  Installed
>
> 2) jwnl.jar from SourceForge
>
> [
> http://sourceforge.net/project/showfiles.php? 
> group_id=33824&package_id=33975
> &release_id=196864 ]
>
>
> After sucess compilation
>
> Type for watch
>
> ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
> "hunting_watch" OR "pendulum_watch" OR
>             "pocket_watch" OR "stem-winder" OR "wristwatch" OR  
> "wrist_watch"
>
> FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
> "hunting watch" OR "pendulum watch" OR "pocket watch"
>
>
> Check this Out,may be u will come up with Briliant Idea's
>
>
>
> with regards
> Karthik
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Monday, January 10, 2005 5:19 PM
> To: Lucene Users List
> Subject: Re: SYNONYM + GOOGLE
>
>
>
> On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
>> If u search Google  using  '~shoes',  It returns  hits  based on the
>> Synonym's
>>
>> [ I know there is a Synonym Wordnet  based Lucene Package in the
>> sandbox
>>
>> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
>> contributions/WordN
>> et/   ]
>>
>> Can this be achieved in Lucene ,If so How ???
>
> Yes, it can be achieved.  Not quite synonyms, but various forms of the
> same word can be found in this example, like this search for similar
> (see the highlighted variations):
>
> 	http://www.lucenebook.com/search?query=similar
>
> This is accomplished using the Snowball stemmer filter found in the
> sandbox.   For synonyms, you have lots of options.  In Lucene in Action
> I demonstrate custom analyzers that inject synonyms using the WordNet
> database (from the sandbox).  From the source code distribution of LIA:
>
> % ant SynonymAnalyzerViewer
> Buildfile: build.xml
>
> SynonymAnalyzerViewer:
>       [echo]
>       [echo]       Using a custom SynonymAnalyzer, two fixed strings  
> are
>       [echo]       analyzed with the results displayed.  Synonyms, from
> the
>       [echo]       WordNet database, are injected into the same  
> positions
>       [echo]       as the original words.
>       [echo]
>       [echo]       See the "Analysis" chapter for more on synonym
> injection and
>       [echo]       position increments.  The "Tools and extensions"
> chapter covers
>       [echo]       the WordNet feature found in the Lucene sandbox.
>       [echo]
>      [input] Press return to continue...
>
>       [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
>
>       [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
> [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
> [agile]
>       [java] 2: [brown] [brownness] [brownish]
>       [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob]  
> [dodger]
> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
>       [java] 4: [jumps]
>       [java] 5: [over] [o] [across]
>       [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
>       [java] 7: [dogs]
>
> ...
>
> The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
>   Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
> "jump" and "dog", but not the plural forms.  Stemming would be a
> necessary step in achieving full synonym look-up, though this would
> need to be done carefully as the stem of a word is not necessarily a
> real word itself - so you'd probably want to stem the synonym database
> also to ensure accurate lookup.
>
> Also notice the semantically incorrect synonyms that appear for the
> animal fox ("confuse", for example).  Be careful!  :)
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: SYNONYM + GOOGLE

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Erik

Apologies.......

I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.


 I Was working on Java WordNet Library , On fiddling with the API's, found
something Interesting ,

 the code attached to this  get's more Synonyms then the Wordnet's Indexed
format avaliable from the LuceneinAction Zip File



1) It needs Wordnet2.0's Dictonery  Installed

2) jwnl.jar from SourceForge

[
http://sourceforge.net/project/showfiles.php?group_id=33824&package_id=33975
&release_id=196864 ]


After sucess compilation

Type for watch

ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
"hunting_watch" OR "pendulum_watch" OR
            "pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch"

FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
"hunting watch" OR "pendulum watch" OR "pocket watch"


Check this Out,may be u will come up with Briliant Idea's



with regards
Karthik

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE



On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
> If u search Google  using  '~shoes',  It returns  hits  based on the
> Synonym's
>
> [ I know there is a Synonym Wordnet  based Lucene Package in the
> sandbox
>
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> contributions/WordN
> et/   ]
>
> Can this be achieved in Lucene ,If so How ???

Yes, it can be achieved.  Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):

	http://www.lucenebook.com/search?query=similar

This is accomplished using the Snowball stemmer filter found in the
sandbox.   For synonyms, you have lots of options.  In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox).  From the source code distribution of LIA:

% ant SynonymAnalyzerViewer
Buildfile: build.xml

SynonymAnalyzerViewer:
      [echo]
      [echo]       Using a custom SynonymAnalyzer, two fixed strings are
      [echo]       analyzed with the results displayed.  Synonyms, from
the
      [echo]       WordNet database, are injected into the same positions
      [echo]       as the original words.
      [echo]
      [echo]       See the "Analysis" chapter for more on synonym
injection and
      [echo]       position increments.  The "Tools and extensions"
chapter covers
      [echo]       the WordNet feature found in the Lucene sandbox.
      [echo]
     [input] Press return to continue...

      [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...

      [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
      [java] 2: [brown] [brownness] [brownish]
      [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
      [java] 4: [jumps]
      [java] 5: [over] [o] [across]
      [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
      [java] 7: [dogs]

...

The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
  Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
"jump" and "dog", but not the plural forms.  Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.

Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example).  Be careful!  :)

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SYNONYM + GOOGLE

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
> If u search Google  using  '~shoes',  It returns  hits  based on the
> Synonym's
>
> [ I know there is a Synonym Wordnet  based Lucene Package in the  
> sandbox
>
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
> contributions/WordN
> et/   ]
>
> Can this be achieved in Lucene ,If so How ???

Yes, it can be achieved.  Not quite synonyms, but various forms of the  
same word can be found in this example, like this search for similar  
(see the highlighted variations):

	http://www.lucenebook.com/search?query=similar

This is accomplished using the Snowball stemmer filter found in the  
sandbox.   For synonyms, you have lots of options.  In Lucene in Action  
I demonstrate custom analyzers that inject synonyms using the WordNet  
database (from the sandbox).  From the source code distribution of LIA:

% ant SynonymAnalyzerViewer
Buildfile: build.xml

SynonymAnalyzerViewer:
      [echo]
      [echo]       Using a custom SynonymAnalyzer, two fixed strings are
      [echo]       analyzed with the results displayed.  Synonyms, from  
the
      [echo]       WordNet database, are injected into the same positions
      [echo]       as the original words.
      [echo]
      [echo]       See the "Analysis" chapter for more on synonym  
injection and
      [echo]       position increments.  The "Tools and extensions"  
chapter covers
      [echo]       the WordNet feature found in the Lucene sandbox.
      [echo]
     [input] Press return to continue...

      [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...

      [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]  
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]  
[agile]
      [java] 2: [brown] [brownness] [brownish]
      [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]  
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
      [java] 4: [jumps]
      [java] 5: [over] [o] [across]
      [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
      [java] 7: [dogs]

...

The phrase analyzed was "The quick brown fox jumps over the lazy dogs".  
  Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for  
"jump" and "dog", but not the plural forms.  Stemming would be a  
necessary step in achieving full synonym look-up, though this would  
need to be done carefully as the stem of a word is not necessarily a  
real word itself - so you'd probably want to stem the synonym database  
also to ensure accurate lookup.

Also notice the semantically incorrect synonyms that appear for the  
animal fox ("confuse", for example).  Be careful!  :)

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org