You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2007/02/12 22:25:28 UTC

Re: search on colon ":" ending words

Not sure if you're still after a solution, but I had a similar issue and I 
modified QueryParser.jj to not treat : as a field name terminator, so work: 
would then just be given as work: to the analyzer and treated as a search term.

Antony


Felix Litman wrote:
> We want to be able to return a result regardless if users use a colon or not in the query.  So 'work:' and 'work' query should still return same result.
> 
> With the current parser if a user enters 'work:'  with a ":" , Lucene does not return anything :-(.   It seems to me the Lucene parser issue.... we are wondering if there is any simple way to make the Lucene parser ignore the ":" in the query?
> 
> any thoughts?
> 
> Erick Erickson <er...@gmail.com> wrote: I've got to ask why you'd want to search on colons. Why not just index the
> words without colons and search without them too? Let's say you index the
> word "work:" Do you really want to have a search on "work" fail?
> 
> By and large, you're better off indexing and searching without
> punctuation....
> 
> Best
> Erick
> 
> On 1/28/07, Felix Litman  wrote:
>> Is there a simple way to turn off field-search syntax in the Lucene
>> parser, and have Lucene recognize words ending in a colon ":" as search
>> terms instead?
>>
>> Such words are very common occurrences for our documents (or any plain
>> text), but Lucene does not seem to find them. :-(
>>
>> Thank you,
>> Felix
>>
>>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search on colon ":" ending words

Posted by Erick Erickson <er...@gmail.com>.
I'd *strongly* advise doing it the simple way, that is, your replace.

1> it's simple and understandable.
2> next time you upgrade Lucene you, or the next poor programmer, will have
to remember/reimplement your change to the parser.
3> How will you insure that others in your organization (and you 6 months
from now) won't spend lots of time wondering why ':' didn't work as a field
separator in the query parser? I flat guarantee this will cause you grief...
4> I don't want Otis, Erik and Yonik to have to spend time answering the
question "Why isn't ':' working as a field separator?" <G>.

Best
Erick

On 2/22/07, Felix Litman <f_...@pacbell.net> wrote:
>
> OK. Thank you.  We'll have to consider using this approach.
>
>   I guess the drawback here is that ":" will not longer work as a field
> operator. ?:-(
>
>   We were also considering using the following approach.
>
>   String newquery = query.replace(query, ": ", " ");
>
>   It seems this way a colon should still work as a field operator if
> followed by a query term with no space in between
>
>   Thanks,
>   Felix.
>
> Antony Bowesman <ad...@teamware.com> wrote:
>   Felix Litman wrote:
> > Yes. thank you. How did you make that modification not to treat ":" as a
> field-name terminator?
> >
> > Is it using this Or some other way?
>
> I removed the : handling stuff from QueryParser.jj in the method:
>
> Query Clause(String field) :
>
> I removed this section
> ---
> [
> LOOKAHEAD(2)
> (
> fieldToken= {field=discardEscapeChar(fieldToken.image);}
> | {field="*";}
> )
> ]
> ---
>
> and you can also remove the COLON and : related bits to do with start
> terms and
> escaped chars if you want to exclude treating : as a separator, but from
> memory,
> it's the above section that does the field recognition.
>
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

Re: search on colon ":" ending words

Posted by Chris Hostetter <ho...@fucit.org>.
:   String newquery = query.replace(query, ": ", " ");

you should be able to usea regex like so...

    String newquery = query.replaceAll(":\\b", "\\\\:");

...(i may have some extra/missing backslashes) to ensure that literal ":"
in your input which are followed by word boundaries are "escaped" fro mteh
query parser ... that way if your analyzer doesn't strip out the ":"
things will still work, and ":" at the end of your input will be properly
escaped (your current string replace will fail in this case)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search on colon ":" ending words

Posted by Felix Litman <f_...@pacbell.net>.
OK. Thank you.  We'll have to consider using this approach.
   
  I guess the drawback here is that ":" will not longer work as a field operator. ?:-(
   
  We were also considering using the following approach.
   
  String newquery = query.replace(query, ": ", " "); 
   
  It seems this way a colon should still work as a field operator if followed by a query term with no space in between
   
  Thanks,
  Felix.
  
Antony Bowesman <ad...@teamware.com> wrote:
  Felix Litman wrote:
> Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator?
> 
> Is it using this Or some other way?

I removed the : handling stuff from QueryParser.jj in the method:

Query Clause(String field) :

I removed this section
---
[
LOOKAHEAD(2)
(
fieldToken= {field=discardEscapeChar(fieldToken.image);}
| {field="*";}
)
]
---

and you can also remove the COLON and : related bits to do with start terms and 
escaped chars if you want to exclude treating : as a separator, but from memory, 
it's the above section that does the field recognition.

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



Re: search on colon ":" ending words

Posted by Antony Bowesman <ad...@teamware.com>.
Felix Litman wrote:
> Yes. thank you.  How did you make that modification not to treat ":" as a field-name terminator?
> 
> Is it using this  Or some other way?

I removed the : handling stuff from QueryParser.jj in the method:

Query Clause(String field) :

I removed this section
---
   [
     LOOKAHEAD(2)
     (
     fieldToken=<TERM> <COLON> {field=discardEscapeChar(fieldToken.image);}
     | <STAR> <COLON> {field="*";}
     )
   ]
---

and you can also remove the COLON and : related bits to do with start terms and 
escaped chars if you want to exclude treating : as a separator, but from memory, 
it's the above section that does the field recognition.

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search on colon ":" ending words

Posted by Felix Litman <f_...@pacbell.net>.
Yes. thank you.  How did you make that modification not to treat ":" as a field-name terminator?

Is it using this  Or some other way?

String newquery = query.replace(query, ":", " ");
Thank you,
Felix
Antony Bowesman <ad...@teamware.com> wrote: Not sure if you're still after a solution, but I had a similar issue and I 
modified QueryParser.jj to not treat : as a field name terminator, so work: 
would then just be given as work: to the analyzer and treated as a search term.

Antony


Felix Litman wrote:
> We want to be able to return a result regardless if users use a colon or not in the query.  So 'work:' and 'work' query should still return same result.
> 
> With the current parser if a user enters 'work:'  with a ":" , Lucene does not return anything :-(.   It seems to me the Lucene parser issue.... we are wondering if there is any simple way to make the Lucene parser ignore the ":" in the query?
> 
> any thoughts?
> 
> Erick Erickson  wrote: I've got to ask why you'd want to search on colons. Why not just index the
> words without colons and search without them too? Let's say you index the
> word "work:" Do you really want to have a search on "work" fail?
> 
> By and large, you're better off indexing and searching without
> punctuation....
> 
> Best
> Erick
> 
> On 1/28/07, Felix Litman  wrote:
>> Is there a simple way to turn off field-search syntax in the Lucene
>> parser, and have Lucene recognize words ending in a colon ":" as search
>> terms instead?
>>
>> Such words are very common occurrences for our documents (or any plain
>> text), but Lucene does not seem to find them. :-(
>>
>> Thank you,
>> Felix
>>
>>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org