You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2007/02/12 22:25:28 UTC
Re: search on colon ":" ending words
Not sure if you're still after a solution, but I had a similar issue and I
modified QueryParser.jj to not treat : as a field name terminator, so work:
would then just be given as work: to the analyzer and treated as a search term.
Antony
Felix Litman wrote:
> We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result.
>
> With the current parser if a user enters 'work:' with a ":" , Lucene does not return anything :-(. It seems to me the Lucene parser issue.... we are wondering if there is any simple way to make the Lucene parser ignore the ":" in the query?
>
> any thoughts?
>
> Erick Erickson <er...@gmail.com> wrote: I've got to ask why you'd want to search on colons. Why not just index the
> words without colons and search without them too? Let's say you index the
> word "work:" Do you really want to have a search on "work" fail?
>
> By and large, you're better off indexing and searching without
> punctuation....
>
> Best
> Erick
>
> On 1/28/07, Felix Litman wrote:
>> Is there a simple way to turn off field-search syntax in the Lucene
>> parser, and have Lucene recognize words ending in a colon ":" as search
>> terms instead?
>>
>> Such words are very common occurrences for our documents (or any plain
>> text), but Lucene does not seem to find them. :-(
>>
>> Thank you,
>> Felix
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search on colon ":" ending words
Posted by Erick Erickson <er...@gmail.com>.
I'd *strongly* advise doing it the simple way, that is, your replace.
1> it's simple and understandable.
2> next time you upgrade Lucene you, or the next poor programmer, will have
to remember/reimplement your change to the parser.
3> How will you insure that others in your organization (and you 6 months
from now) won't spend lots of time wondering why ':' didn't work as a field
separator in the query parser? I flat guarantee this will cause you grief...
4> I don't want Otis, Erik and Yonik to have to spend time answering the
question "Why isn't ':' working as a field separator?" <G>.
Best
Erick
On 2/22/07, Felix Litman <f_...@pacbell.net> wrote:
>
> OK. Thank you. We'll have to consider using this approach.
>
> I guess the drawback here is that ":" will not longer work as a field
> operator. ?:-(
>
> We were also considering using the following approach.
>
> String newquery = query.replace(query, ": ", " ");
>
> It seems this way a colon should still work as a field operator if
> followed by a query term with no space in between
>
> Thanks,
> Felix.
>
> Antony Bowesman <ad...@teamware.com> wrote:
> Felix Litman wrote:
> > Yes. thank you. How did you make that modification not to treat ":" as a
> field-name terminator?
> >
> > Is it using this Or some other way?
>
> I removed the : handling stuff from QueryParser.jj in the method:
>
> Query Clause(String field) :
>
> I removed this section
> ---
> [
> LOOKAHEAD(2)
> (
> fieldToken= {field=discardEscapeChar(fieldToken.image);}
> | {field="*";}
> )
> ]
> ---
>
> and you can also remove the COLON and : related bits to do with start
> terms and
> escaped chars if you want to exclude treating : as a separator, but from
> memory,
> it's the above section that does the field recognition.
>
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
Re: search on colon ":" ending words
Posted by Chris Hostetter <ho...@fucit.org>.
: String newquery = query.replace(query, ": ", " ");
you should be able to usea regex like so...
String newquery = query.replaceAll(":\\b", "\\\\:");
...(i may have some extra/missing backslashes) to ensure that literal ":"
in your input which are followed by word boundaries are "escaped" fro mteh
query parser ... that way if your analyzer doesn't strip out the ":"
things will still work, and ":" at the end of your input will be properly
escaped (your current string replace will fail in this case)
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search on colon ":" ending words
Posted by Felix Litman <f_...@pacbell.net>.
OK. Thank you. We'll have to consider using this approach.
I guess the drawback here is that ":" will not longer work as a field operator. ?:-(
We were also considering using the following approach.
String newquery = query.replace(query, ": ", " ");
It seems this way a colon should still work as a field operator if followed by a query term with no space in between
Thanks,
Felix.
Antony Bowesman <ad...@teamware.com> wrote:
Felix Litman wrote:
> Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator?
>
> Is it using this Or some other way?
I removed the : handling stuff from QueryParser.jj in the method:
Query Clause(String field) :
I removed this section
---
[
LOOKAHEAD(2)
(
fieldToken= {field=discardEscapeChar(fieldToken.image);}
| {field="*";}
)
]
---
and you can also remove the COLON and : related bits to do with start terms and
escaped chars if you want to exclude treating : as a separator, but from memory,
it's the above section that does the field recognition.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search on colon ":" ending words
Posted by Antony Bowesman <ad...@teamware.com>.
Felix Litman wrote:
> Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator?
>
> Is it using this Or some other way?
I removed the : handling stuff from QueryParser.jj in the method:
Query Clause(String field) :
I removed this section
---
[
LOOKAHEAD(2)
(
fieldToken=<TERM> <COLON> {field=discardEscapeChar(fieldToken.image);}
| <STAR> <COLON> {field="*";}
)
]
---
and you can also remove the COLON and : related bits to do with start terms and
escaped chars if you want to exclude treating : as a separator, but from memory,
it's the above section that does the field recognition.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: search on colon ":" ending words
Posted by Felix Litman <f_...@pacbell.net>.
Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator?
Is it using this Or some other way?
String newquery = query.replace(query, ":", " ");
Thank you,
Felix
Antony Bowesman <ad...@teamware.com> wrote: Not sure if you're still after a solution, but I had a similar issue and I
modified QueryParser.jj to not treat : as a field name terminator, so work:
would then just be given as work: to the analyzer and treated as a search term.
Antony
Felix Litman wrote:
> We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result.
>
> With the current parser if a user enters 'work:' with a ":" , Lucene does not return anything :-(. It seems to me the Lucene parser issue.... we are wondering if there is any simple way to make the Lucene parser ignore the ":" in the query?
>
> any thoughts?
>
> Erick Erickson wrote: I've got to ask why you'd want to search on colons. Why not just index the
> words without colons and search without them too? Let's say you index the
> word "work:" Do you really want to have a search on "work" fail?
>
> By and large, you're better off indexing and searching without
> punctuation....
>
> Best
> Erick
>
> On 1/28/07, Felix Litman wrote:
>> Is there a simple way to turn off field-search syntax in the Lucene
>> parser, and have Lucene recognize words ending in a colon ":" as search
>> terms instead?
>>
>> Such words are very common occurrences for our documents (or any plain
>> text), but Lucene does not seem to find them. :-(
>>
>> Thank you,
>> Felix
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org