You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sanyi <ne...@yahoo.com> on 2004/12/21 09:04:29 UTC
Synonyms for AND/OR/NOT operators
Hi!
What is the simplest way to add synonyms for AND/OR/NOT operators?
I'd like to support two sets of operator words, so people can use either the original english
operators and my custom ones for our local language.
Thank you for your attention!
Sanyi
__________________________________
Do you Yahoo!?
Send holiday email and support a worthy cause. Do good.
http://celebrity.mail.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Morus Walter <mo...@tanto.de>.
Sanyi writes:
> Well, I guess I'd better recognize and replace the operator synonyms to their original format
> before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code.
>
Apart from knowing how to compile lucene (including the javacc code
generation) you should only need to change
<DEFAULT> TOKEN : {
<AND: ("AND" | "&&") >
| <OR: ("OR" | "||") >
| <NOT: ("NOT" | "!") >
to
<DEFAULT> TOKEN : {
<AND: ("AND" | "<insert your version of and here>" | "&&") >
| <OR: ("OR" | "<insert your version of or here>" | "||") >
| <NOT: ("NOT" | "<insert your version of not here>" | "!") >
in jakarta-lucene/src/java/org/apache/lucene/queryParser/QueryParser.jj
Replacing the operators before query might be hard to do, if you want
to handle cases like »"a AND b" OR c«, which is a query for a
phrase "a AND b" or the token c, correctly.
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Sanyi <ne...@yahoo.com>.
Well, I guess I'd better recognize and replace the operator synonyms to their original format
before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code.
Anyway, thanx for the answers.
Sanyi
--- Morus Walter <mo...@tanto.de> wrote:
> Erik Hatcher writes:
> > On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > > I'd like to support two sets of operator words, so people can use
> > > either the original english
> > > operators and my custom ones for our local language.
> >
> > There are two options that I know of: 1) add synonyms during indexing
> > and 2) add synonyms during querying. Generally this would be done
> > using a custom analyzer.
>
> I guess you missunderstood the question.
>
> I think he want's to know how to create a query parser understanding
> something like 'a UND b' as well as 'a AND b' to support localized
> operator names (german in this case).
>
> AFAIK that can only be done by copying query parsers javacc-source and
> adding the operators there.
> Shouldn't be difficult, though it's a bit ugly since it implies code
> duplication. And there will be no way of choosing the operators dynamically
> at runtime. One will need to have different query parsers for different
> languages.
>
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Morus Walter <mo...@tanto.de>.
Erik Hatcher writes:
> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > I'd like to support two sets of operator words, so people can use
> > either the original english
> > operators and my custom ones for our local language.
>
> There are two options that I know of: 1) add synonyms during indexing
> and 2) add synonyms during querying. Generally this would be done
> using a custom analyzer.
I guess you missunderstood the question.
I think he want's to know how to create a query parser understanding
something like 'a UND b' as well as 'a AND b' to support localized
operator names (german in this case).
AFAIK that can only be done by copying query parsers javacc-source and
adding the operators there.
Shouldn't be difficult, though it's a bit ugly since it implies code
duplication. And there will be no way of choosing the operators dynamically
at runtime. One will need to have different query parsers for different
languages.
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Wow, I really did misunderstand. My apologies.
Yes, you will need to fork QueryParser.jj and install JavaCC to build
your custom parser. It should be pretty trivial to add alternatives to
AND(+)/OR/NOT(-).
Erik
On Dec 21, 2004, at 4:42 AM, Sanyi wrote:
> Hi!
>
> I think we're talking about different things.
> My question is about using synonyms for AND/OR/NOT operators, not
> about synonyms of words in the
> index.
> For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT
>
> So, the user can enter:
> (cat OR kitty) AND black AND tail
>
> and either:
>
> (cat OORR kitty) AANNDD black AANNDD tail
>
> Both sets of operators must work.
> It must be some kind of a query parser modification/parametering, so
> there is nothing to do with
> the index.
>
> I hope I was more specific now ;)
>
> Thanx,
> Sanyi
>
>
>
>
> --- Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
>> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
>>> What is the simplest way to add synonyms for AND/OR/NOT operators?
>>> I'd like to support two sets of operator words, so people can use
>>> either the original english
>>> operators and my custom ones for our local language.
>>
>> There are two options that I know of: 1) add synonyms during indexing
>> and 2) add synonyms during querying. Generally this would be done
>> using a custom analyzer.
>>
>> If the synonym mappings are static and you don't mind a larger index,
>> adding them during indexing avoids the complexity of rewriting the
>> query. Injecting synonyms during querying allows the synonym mappings
>> to change dynamically, though does produce more complex queries.
>> Here's an example you'll find with the source code distribution of
>> Lucene in Action which uses WordNet to look up synonyms.
>>
>> Erik
>>
>> p.s. I'm sensitive to over-marketing Lucene in Action in this forum as
>> it would bother me to constantly see an advertisement. You can be
>> sure
>> that any mentions of it from me will coincide with concrete examples
>> (which are freely available) that are directly related to questions
>> being asked.
>>
>>
>> % ant -emacs SynonymAnalyzerViewer
>> Buildfile: build.xml
>>
>> check-environment:
>>
>> compile:
>>
>> build-test-index:
>>
>> build-perf-index:
>>
>> prepare:
>>
>> SynonymAnalyzerViewer:
>>
>> Using a custom SynonymAnalyzer, two fixed strings are
>> analyzed with the results displayed. Synonyms, from the
>> WordNet database, are injected into the same positions
>> as the original words.
>>
>> See the "Analysis" chapter for more on synonym injection and
>> position increments. The "Tools and extensions" chapter covers
>> the WordNet feature found in the Lucene sandbox.
>>
>> Press return to continue...
>>
>> Running lia.analysis.synonym.SynonymAnalyzerViewer...
>>
>> 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly]
>> [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
>> 2: [brown] [brownness] [brownish]
>> 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
>> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
>> 4: [jumps]
>> 5: [over] [o] [across]
>> 6: [lazy] [faineant] [indolent] [otiose] [slothful]
>> 7: [dogs]
>>
>> 1: [oh]
>> 2: [we]
>> 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat]
>> [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch]
>> [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive]
>> [dumbfound] [engender] [experience] [father] [fetch] [find] [fix]
>> [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce]
>> [let]
>> [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce]
>> [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy]
>> [stupify] [suffer] [sustain] [take] [trounce] [undergo]
>> 4: [both]
>> 5: [kinds]
>> 6: [country] [state] [nationality] [nation] [land] [commonwealth]
>> [area]
>> 7: [western] [westerly]
>> 8: [bb]
>>
>> BUILD SUCCESSFUL
>> Total time: 10 seconds
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Dress up your holiday email, Hollywood style. Learn more.
> http://celebrity.mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Sanyi <ne...@yahoo.com>.
Hi!
I think we're talking about different things.
My question is about using synonyms for AND/OR/NOT operators, not about synonyms of words in the
index.
For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT
So, the user can enter:
(cat OR kitty) AND black AND tail
and either:
(cat OORR kitty) AANNDD black AANNDD tail
Both sets of operators must work.
It must be some kind of a query parser modification/parametering, so there is nothing to do with
the index.
I hope I was more specific now ;)
Thanx,
Sanyi
--- Erik Hatcher <er...@ehatchersolutions.com> wrote:
> On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> > What is the simplest way to add synonyms for AND/OR/NOT operators?
> > I'd like to support two sets of operator words, so people can use
> > either the original english
> > operators and my custom ones for our local language.
>
> There are two options that I know of: 1) add synonyms during indexing
> and 2) add synonyms during querying. Generally this would be done
> using a custom analyzer.
>
> If the synonym mappings are static and you don't mind a larger index,
> adding them during indexing avoids the complexity of rewriting the
> query. Injecting synonyms during querying allows the synonym mappings
> to change dynamically, though does produce more complex queries.
> Here's an example you'll find with the source code distribution of
> Lucene in Action which uses WordNet to look up synonyms.
>
> Erik
>
> p.s. I'm sensitive to over-marketing Lucene in Action in this forum as
> it would bother me to constantly see an advertisement. You can be sure
> that any mentions of it from me will coincide with concrete examples
> (which are freely available) that are directly related to questions
> being asked.
>
>
> % ant -emacs SynonymAnalyzerViewer
> Buildfile: build.xml
>
> check-environment:
>
> compile:
>
> build-test-index:
>
> build-perf-index:
>
> prepare:
>
> SynonymAnalyzerViewer:
>
> Using a custom SynonymAnalyzer, two fixed strings are
> analyzed with the results displayed. Synonyms, from the
> WordNet database, are injected into the same positions
> as the original words.
>
> See the "Analysis" chapter for more on synonym injection and
> position increments. The "Tools and extensions" chapter covers
> the WordNet feature found in the Lucene sandbox.
>
> Press return to continue...
>
> Running lia.analysis.synonym.SynonymAnalyzerViewer...
>
> 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly]
> [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
> 2: [brown] [brownness] [brownish]
> 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
> [discombobulate] [confuse] [confound] [befuddle] [bedevil]
> 4: [jumps]
> 5: [over] [o] [across]
> 6: [lazy] [faineant] [indolent] [otiose] [slothful]
> 7: [dogs]
>
> 1: [oh]
> 2: [we]
> 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat]
> [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch]
> [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive]
> [dumbfound] [engender] [experience] [father] [fetch] [find] [fix]
> [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let]
> [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce]
> [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy]
> [stupify] [suffer] [sustain] [take] [trounce] [undergo]
> 4: [both]
> 5: [kinds]
> 6: [country] [state] [nationality] [nation] [land] [commonwealth] [area]
> 7: [western] [westerly]
> 8: [bb]
>
> BUILD SUCCESSFUL
> Total time: 10 seconds
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
__________________________________
Do you Yahoo!?
Dress up your holiday email, Hollywood style. Learn more.
http://celebrity.mail.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Synonyms for AND/OR/NOT operators
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 21, 2004, at 3:04 AM, Sanyi wrote:
> What is the simplest way to add synonyms for AND/OR/NOT operators?
> I'd like to support two sets of operator words, so people can use
> either the original english
> operators and my custom ones for our local language.
There are two options that I know of: 1) add synonyms during indexing
and 2) add synonyms during querying. Generally this would be done
using a custom analyzer.
If the synonym mappings are static and you don't mind a larger index,
adding them during indexing avoids the complexity of rewriting the
query. Injecting synonyms during querying allows the synonym mappings
to change dynamically, though does produce more complex queries.
Here's an example you'll find with the source code distribution of
Lucene in Action which uses WordNet to look up synonyms.
Erik
p.s. I'm sensitive to over-marketing Lucene in Action in this forum as
it would bother me to constantly see an advertisement. You can be sure
that any mentions of it from me will coincide with concrete examples
(which are freely available) that are directly related to questions
being asked.
% ant -emacs SynonymAnalyzerViewer
Buildfile: build.xml
check-environment:
compile:
build-test-index:
build-perf-index:
prepare:
SynonymAnalyzerViewer:
Using a custom SynonymAnalyzer, two fixed strings are
analyzed with the results displayed. Synonyms, from the
WordNet database, are injected into the same positions
as the original words.
See the "Analysis" chapter for more on synonym injection and
position increments. The "Tools and extensions" chapter covers
the WordNet feature found in the Lucene sandbox.
Press return to continue...
Running lia.analysis.synonym.SynonymAnalyzerViewer...
1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly]
[promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile]
2: [brown] [brownness] [brownish]
3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
4: [jumps]
5: [over] [o] [across]
6: [lazy] [faineant] [indolent] [otiose] [slothful]
7: [dogs]
1: [oh]
2: [we]
3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat]
[become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch]
[cause] [come] [commence] [contract] [convey] [develop] [draw] [drive]
[dumbfound] [engender] [experience] [father] [fetch] [find] [fix]
[flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let]
[make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce]
[puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy]
[stupify] [suffer] [sustain] [take] [trounce] [undergo]
4: [both]
5: [kinds]
6: [country] [state] [nationality] [nation] [land] [commonwealth] [area]
7: [western] [westerly]
8: [bb]
BUILD SUCCESSFUL
Total time: 10 seconds
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org