You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sanyi <ne...@yahoo.com> on 2004/12/21 09:04:29 UTC

Synonyms for AND/OR/NOT operators

Hi!

What is the simplest way to add synonyms for AND/OR/NOT operators?
I'd like to support two sets of operator words, so people can use either the original english
operators and my custom ones for our local language.

Thank you for your attention!
Sanyi


		
__________________________________ 
Do you Yahoo!? 
Send holiday email and support a worthy cause. Do good. 
http://celebrity.mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Morus Walter <mo...@tanto.de>.
Sanyi writes:
> Well, I guess I'd better recognize and replace the operator synonyms to their original format
> before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code.
> 
Apart from knowing how to compile lucene (including the javacc code
generation) you should only need to change

<DEFAULT> TOKEN : {
  <AND:       ("AND" | "&&") >
| <OR:        ("OR" | "||") >
| <NOT:       ("NOT" | "!") >

to
<DEFAULT> TOKEN : {
  <AND:       ("AND" | "<insert your version of and here>" | "&&") >
| <OR:        ("OR" | "<insert your version of or here>" | "||") >
| <NOT:       ("NOT" | "<insert your version of not here>" | "!") >

in jakarta-lucene/src/java/org/apache/lucene/queryParser/QueryParser.jj

Replacing the operators before query might be hard to do, if you want
to handle cases like »"a AND b" OR c«, which is a query for a 
phrase "a AND b" or the token c, correctly.

Morus



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Sanyi <ne...@yahoo.com>.
Well, I guess I'd better recognize and replace the operator synonyms to their original format
before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code.

Anyway, thanx for the answers.

Sanyi

--- Morus Walter <mo...@tanto.de> wrote:

> Erik Hatcher writes:
> > On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > > I'd like to support two sets of operator words, so people can use 
> > > either the original english
> > > operators and my custom ones for our local language.
> > 
> > There are two options that I know of: 1) add synonyms during indexing 
> > and 2) add synonyms during querying.  Generally this would be done 
> > using a custom analyzer.
> 
> I guess you missunderstood the question.
> 
> I think he want's to know how to create a query parser understanding 
> something like 'a UND b' as well as 'a AND b' to support localized 
> operator names (german in this case).
> 
> AFAIK that can only be done by copying query parsers javacc-source and
> adding the operators there.
> Shouldn't be difficult, though it's a bit ugly since it implies code
> duplication. And there will be no way of choosing the operators dynamically
> at runtime. One will need to have different query parsers for different
> languages.
> 
> Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Morus Walter <mo...@tanto.de>.
Erik Hatcher writes:
> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > I'd like to support two sets of operator words, so people can use 
> > either the original english
> > operators and my custom ones for our local language.
> 
> There are two options that I know of: 1) add synonyms during indexing 
> and 2) add synonyms during querying.  Generally this would be done 
> using a custom analyzer.

I guess you missunderstood the question.

I think he want's to know how to create a query parser understanding 
something like 'a UND b' as well as 'a AND b' to support localized 
operator names (german in this case).

AFAIK that can only be done by copying query parsers javacc-source and
adding the operators there.
Shouldn't be difficult, though it's a bit ugly since it implies code
duplication. And there will be no way of choosing the operators dynamically
at runtime. One will need to have different query parsers for different
languages.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Wow, I really did misunderstand.  My apologies.

Yes, you will need to fork QueryParser.jj and install JavaCC to build 
your custom parser.  It should be pretty trivial to add alternatives to 
AND(+)/OR/NOT(-).

	Erik


On Dec 21, 2004, at 4:42 AM, Sanyi wrote:

> Hi!
>
> I think we're talking about different things.
> My question is about using synonyms for AND/OR/NOT operators, not 
> about synonyms of words in the
> index.
> For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT
>
> So, the user can enter:
> (cat OR kitty) AND black AND tail
>
> and either:
>
> (cat OORR kitty) AANNDD black AANNDD tail
>
> Both sets of operators must work.
> It must be some kind of a query parser modification/parametering, so 
> there is nothing to do with
> the index.
>
> I hope I was more specific now ;)
>
> Thanx,
> Sanyi
>
>
>
>
> --- Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
>> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
>>> What is the simplest way to add synonyms for AND/OR/NOT operators?
>>> I'd like to support two sets of operator words, so people can use
>>> either the original english
>>> operators and my custom ones for our local language.
>>
>> There are two options that I know of: 1) add synonyms during indexing
>> and 2) add synonyms during querying.  Generally this would be done
>> using a custom analyzer.
>>
>> If the synonym mappings are static and you don't mind a larger index,
>> adding them during indexing avoids the complexity of rewriting the
>> query.  Injecting synonyms during querying allows the synonym mappings
>> to change dynamically, though does produce more complex queries.
>> Here's an example you'll find with the source code distribution of
>> Lucene in Action which uses WordNet to look up synonyms.
>>
>> 	Erik
>>
>> p.s. I'm sensitive to over-marketing Lucene in Action in this forum as
>> it would bother me to constantly see an advertisement.  You can be 
>> sure
>> that any mentions of it from me will coincide with concrete examples
>> (which are freely available) that are directly related to questions
>> being asked.
>>
>>
>> % ant -emacs SynonymAnalyzerViewer
>> Buildfile: build.xml
>>
>> check-environment:
>>
>> compile:
>>
>> build-test-index:
>>
>> build-perf-index:
>>
>> prepare:
>>
>> SynonymAnalyzerViewer:
>>
>>        Using a custom SynonymAnalyzer, two fixed strings are
>>        analyzed with the results displayed.  Synonyms, from the
>>        WordNet database, are injected into the same positions
>>        as the original words.
>>
>>        See the "Analysis" chapter for more on synonym injection and
>>        position increments.  The "Tools and extensions" chapter covers
>>        the WordNet feature found in the Lucene sandbox.
>>
>> Press return to continue...
>>
>> Running lia.analysis.synonym.SynonymAnalyzerViewer...
>>
>> 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly]
>> [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
>> 2: [brown] [brownness] [brownish]
>> 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
>> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
>> 4: [jumps]
>> 5: [over] [o] [across]
>> 6: [lazy] [faineant] [indolent] [otiose] [slothful]
>> 7: [dogs]
>>
>> 1: [oh]
>> 2: [we]
>> 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat]
>> [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch]
>> [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive]
>> [dumbfound] [engender] [experience] [father] [fetch] [find] [fix]
>> [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] 
>> [let]
>> [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce]
>> [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy]
>> [stupify] [suffer] [sustain] [take] [trounce] [undergo]
>> 4: [both]
>> 5: [kinds]
>> 6: [country] [state] [nationality] [nation] [land] [commonwealth] 
>> [area]
>> 7: [western] [westerly]
>> 8: [bb]
>>
>> BUILD SUCCESSFUL
>> Total time: 10 seconds
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>
>
>
> 		
> __________________________________
> Do you Yahoo!?
> Dress up your holiday email, Hollywood style. Learn more.
> http://celebrity.mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Sanyi <ne...@yahoo.com>.
Hi!

I think we're talking about different things.
My question is about using synonyms for AND/OR/NOT operators, not about synonyms of words in the
index.
For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT

So, the user can enter:
(cat OR kitty) AND black AND tail

and either:

(cat OORR kitty) AANNDD black AANNDD tail

Both sets of operators must work.
It must be some kind of a query parser modification/parametering, so there is nothing to do with
the index.

I hope I was more specific now ;)

Thanx,
Sanyi




--- Erik Hatcher <er...@ehatchersolutions.com> wrote:

> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > I'd like to support two sets of operator words, so people can use 
> > either the original english
> > operators and my custom ones for our local language.
> 
> There are two options that I know of: 1) add synonyms during indexing 
> and 2) add synonyms during querying.  Generally this would be done 
> using a custom analyzer.
> 
> If the synonym mappings are static and you don't mind a larger index, 
> adding them during indexing avoids the complexity of rewriting the 
> query.  Injecting synonyms during querying allows the synonym mappings 
> to change dynamically, though does produce more complex queries.  
> Here's an example you'll find with the source code distribution of 
> Lucene in Action which uses WordNet to look up synonyms.
> 
> 	Erik
> 
> p.s. I'm sensitive to over-marketing Lucene in Action in this forum as 
> it would bother me to constantly see an advertisement.  You can be sure 
> that any mentions of it from me will coincide with concrete examples 
> (which are freely available) that are directly related to questions 
> being asked.
> 
> 
> % ant -emacs SynonymAnalyzerViewer
> Buildfile: build.xml
> 
> check-environment:
> 
> compile:
> 
> build-test-index:
> 
> build-perf-index:
> 
> prepare:
> 
> SynonymAnalyzerViewer:
> 
>        Using a custom SynonymAnalyzer, two fixed strings are
>        analyzed with the results displayed.  Synonyms, from the
>        WordNet database, are injected into the same positions
>        as the original words.
> 
>        See the "Analysis" chapter for more on synonym injection and
>        position increments.  The "Tools and extensions" chapter covers
>        the WordNet feature found in the Lucene sandbox.
> 
> Press return to continue...
> 
> Running lia.analysis.synonym.SynonymAnalyzerViewer...
> 
> 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] 
> [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
> 2: [brown] [brownness] [brownish]
> 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] 
> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
> 4: [jumps]
> 5: [over] [o] [across]
> 6: [lazy] [faineant] [indolent] [otiose] [slothful]
> 7: [dogs]
> 
> 1: [oh]
> 2: [we]
> 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat] 
> [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch] 
> [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive] 
> [dumbfound] [engender] [experience] [father] [fetch] [find] [fix] 
> [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let] 
> [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce] 
> [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy] 
> [stupify] [suffer] [sustain] [take] [trounce] [undergo]
> 4: [both]
> 5: [kinds]
> 6: [country] [state] [nationality] [nation] [land] [commonwealth] [area]
> 7: [western] [westerly]
> 8: [bb]
> 
> BUILD SUCCESSFUL
> Total time: 10 seconds
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Dress up your holiday email, Hollywood style. Learn more. 
http://celebrity.mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Synonyms for AND/OR/NOT operators

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> What is the simplest way to add synonyms for AND/OR/NOT operators?
> I'd like to support two sets of operator words, so people can use 
> either the original english
> operators and my custom ones for our local language.

There are two options that I know of: 1) add synonyms during indexing 
and 2) add synonyms during querying.  Generally this would be done 
using a custom analyzer.

If the synonym mappings are static and you don't mind a larger index, 
adding them during indexing avoids the complexity of rewriting the 
query.  Injecting synonyms during querying allows the synonym mappings 
to change dynamically, though does produce more complex queries.  
Here's an example you'll find with the source code distribution of 
Lucene in Action which uses WordNet to look up synonyms.

	Erik

p.s. I'm sensitive to over-marketing Lucene in Action in this forum as 
it would bother me to constantly see an advertisement.  You can be sure 
that any mentions of it from me will coincide with concrete examples 
(which are freely available) that are directly related to questions 
being asked.


% ant -emacs SynonymAnalyzerViewer
Buildfile: build.xml

check-environment:

compile:

build-test-index:

build-perf-index:

prepare:

SynonymAnalyzerViewer:

       Using a custom SynonymAnalyzer, two fixed strings are
       analyzed with the results displayed.  Synonyms, from the
       WordNet database, are injected into the same positions
       as the original words.

       See the "Analysis" chapter for more on synonym injection and
       position increments.  The "Tools and extensions" chapter covers
       the WordNet feature found in the Lucene sandbox.

Press return to continue...

Running lia.analysis.synonym.SynonymAnalyzerViewer...

1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] 
[promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
2: [brown] [brownness] [brownish]
3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] 
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
4: [jumps]
5: [over] [o] [across]
6: [lazy] [faineant] [indolent] [otiose] [slothful]
7: [dogs]

1: [oh]
2: [we]
3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat] 
[become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch] 
[cause] [come] [commence] [contract] [convey] [develop] [draw] [drive] 
[dumbfound] [engender] [experience] [father] [fetch] [find] [fix] 
[flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let] 
[make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce] 
[puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy] 
[stupify] [suffer] [sustain] [take] [trounce] [undergo]
4: [both]
5: [kinds]
6: [country] [state] [nationality] [nation] [land] [commonwealth] [area]
7: [western] [westerly]
8: [bb]

BUILD SUCCESSFUL
Total time: 10 seconds


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org