You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Allen Atamer <at...@sympatico.ca> on 2008/04/08 23:59:24 UTC
designing a dictionary filter with multiple word entries
My dictionary filter currently implements next() and everything works well
when dictionary entries are replaced one-to-one. For example: Can =>
Canada.
A problem arises when I try to replace it with more than one word. Going
through next() I encounter "shutdown". But the dictionary entry takes
Shutdown => shut down (two words). I construct a replacement term according
the to the instructions in the Javadoc, but the search does not match any
substrings "shut" or "down" in my database. I debugged it and found
QueryParser is converting my replaced text into PhraseQuery objects instead
of BooleanQuery objects.
My code to replace the string is below:
Token teachToken = new Token();
teachToken.resizeTermBuffer(replacementTerm.length());
char [] termBuffer = teachToken.termBuffer();
for (int i = 0; i < replacementTerm.length(); i++) {
termBuffer[i] = replacementTerm.charAt(i);
}
teachToken.setTermLength(replacementTerm.length());
this.tokenQueue.push(teachToken);
return teachToken;
Instead of [field1]:shut down, it is searching with [field1]:"shut down".
How can I construct the replacement terms so that queries are formed
properly, and I don't violate the next() contract?
Thanks a bunch!
Allen
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.9/1365 - Release Date: 4/8/2008
7:30 AM
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: designing a dictionary filter with multiple word entries
Posted by Allen Atamer <at...@sympatico.ca>.
Mathieu,
Your suggestion pointed me in the right direction. I was using a private
queue instead of using the inherited TokenStream from the superclass.
Thanks.
However that still didn't stop the PhraseQuery problem from happening. I
instead needed to add one more item into the code below
newToken.setPositionIncrement(0);
by setting more than one term to the zero position increment, it forces a
BooleanQuery to be constructed within the MultiFieldQueryParser class,
instead of the PhraseQuery object.
Allen
-----Original Message-----
From: Mathieu Lecarme [mailto:mathieu@garambrogne.net]
Sent: Wednesday, April 09, 2008 3:42 AM
To: java-user@lucene.apache.org
Subject: Re: designing a dictionary filter with multiple word entries
Allen Atamer a écrit :
> My dictionary filter currently implements next() and everything works well
> when dictionary entries are replaced one-to-one. For example: Can =>
> Canada.
>
> A problem arises when I try to replace it with more than one word. Going
> through next() I encounter "shutdown". But the dictionary entry takes
> Shutdown => shut down (two words). I construct a replacement term
according
> the to the instructions in the Javadoc, but the search does not match any
> substrings "shut" or "down" in my database. I debugged it and found
> QueryParser is converting my replaced text into PhraseQuery objects
instead
> of BooleanQuery objects.
>
> My code to replace the string is below:
>
> Token teachToken = new Token();
> teachToken.resizeTermBuffer(replacementTerm.length());
>
> char [] termBuffer = teachToken.termBuffer();
> for (int i = 0; i < replacementTerm.length(); i++) {
> termBuffer[i] = replacementTerm.charAt(i);
> }
> teachToken.setTermLength(replacementTerm.length());
> this.tokenQueue.push(teachToken);
> return teachToken;
>
> Instead of [field1]:shut down, it is searching with [field1]:"shut down".
>
> How can I construct the replacement terms so that queries are formed
> properly, and I don't violate the next() contract?
>
use a private stack. When you replace a word, fill it, and the next()
will pop your stack. When the stack is empty, feed it again. So first
time next() will get "shut", and second time "down".
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.10/1367 - Release Date: 4/9/2008
7:10 AM
No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.12/1373 - Release Date: 4/11/2008
9:17 AM
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: designing a dictionary filter with multiple word entries
Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Allen Atamer a écrit :
> My dictionary filter currently implements next() and everything works well
> when dictionary entries are replaced one-to-one. For example: Can =>
> Canada.
>
> A problem arises when I try to replace it with more than one word. Going
> through next() I encounter "shutdown". But the dictionary entry takes
> Shutdown => shut down (two words). I construct a replacement term according
> the to the instructions in the Javadoc, but the search does not match any
> substrings "shut" or "down" in my database. I debugged it and found
> QueryParser is converting my replaced text into PhraseQuery objects instead
> of BooleanQuery objects.
>
> My code to replace the string is below:
>
> Token teachToken = new Token();
> teachToken.resizeTermBuffer(replacementTerm.length());
>
> char [] termBuffer = teachToken.termBuffer();
> for (int i = 0; i < replacementTerm.length(); i++) {
> termBuffer[i] = replacementTerm.charAt(i);
> }
> teachToken.setTermLength(replacementTerm.length());
> this.tokenQueue.push(teachToken);
> return teachToken;
>
> Instead of [field1]:shut down, it is searching with [field1]:"shut down".
>
> How can I construct the replacement terms so that queries are formed
> properly, and I don't violate the next() contract?
>
use a private stack. When you replace a word, fill it, and the next()
will pop your stack. When the stack is empty, feed it again. So first
time next() will get "shut", and second time "down".
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org