You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doug Cutting <DC...@grandcentral.com> on 2001/10/31 00:59:43 UTC

RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug ?)

> From: Halácsy Péter [mailto:halacsy.peter@axelero.com]
>
> I think  IDENTIFIER_CHAR doesn't need to be the first char so my
> proposal is:
> <TERM:   ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*", "?",
> "~", "{", "}", "[", "]" ] )+ >

That looks like the right approach to me.

> On the other hand IDENTIFIER, ALPHA_CHAR, ALPHANUM_CHAR tokens are
> definied but are not used.

So let's remove them!

> ps: I don't understand the definition of WILD_TERM. It states that a
> wild term must end with identifier_char, so cannot end with 
> *. Is it the right definition?

Yes.  The code for handling a final asterisk (PrefixQuery) is different from
term general term wildcarding code (WildCardQuery).

These changes yield the following token definitions in QueryParser.jj:

<*> TOKEN : {
  <#_NUM_CHAR:   ["0"-"9"] >
| <#_TERM_CHAR: ~["\"", " ", "\t", "(", ")", ":", "&", "|",
                  "^", "*", "?", "~", "{", "}", "[", "]" ] >
| <#_NEWLINE:    ( "\r\n" | "\r" | "\n" ) >
| <#_WHITESPACE: ( " " | "\t" ) >
| <#_QCHAR:      ( "\\" (<_NEWLINE> | ~["a"-"z", "A"-"Z", "0"-"9"] ) ) >
| <#_RESTOFLINE: (~["\r", "\n"])* >
}

<DEFAULT> TOKEN : {
  <AND:       ("AND" | "&&") >
| <OR:        ("OR" | "||") >
| <NOT:       ("NOT" | "!") >
| <PLUS:      "+" >
| <MINUS:     "-" >
| <LPAREN:    "(" >
| <RPAREN:    ")" >
| <COLON:     ":" >
| <CARAT:     "^" >
| <STAR:      "*" >
| <QUOTED:     "\"" (~["\""])+ "\"">
| <NUMBER:    (["+","-"])? (<_NUM_CHAR>)+ "." (<_NUM_CHAR>)+ >
| <TERM:      (<_TERM_CHAR>)+ >
| <FUZZY:     "~" >
| <WILDTERM:  <_TERM_CHAR>
              ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "~", "{",
"}", "[", "]" ] )+ <_TERM_CHAR>>
| <RANGEIN:   "[" (~["]"])+ "]">
| <RANGEEX:   "{" (~["}"])+ "}">
}

<DEFAULT> SKIP : {
  <<_WHITESPACE>>
}

Can folks try these and tell me if it solves the problem?

Ideally we should add some cases for this to the junit tests, but I can't
get junit to work at all right now...  Have the junit tests ever run
correctly from ant since the move to Jakarta?  Can someone more familiar
with junit have a look at this?

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>