You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Bharatbhushan_Shetty <Bh...@infosys.com> on 2003/07/14 10:24:39 UTC

HELP in QueyParsing !!

Hi 

Need some help in queryparsing. There are few things that doesn't
Seem to work as I expected. Have a look at the code at the end 
Before reading my observations.

Document contains following information:
D1 = c++ hello bharat
D2 = c hello sharat 
D3 = hello bharat

Observations
============
Input:   QueryCreated     Remarks
c\+\+      c           (Escape character not working)
c++        -           (Parser throws an exception) [NOTE-1]
c*         c*          (Wild card works perfectly fine)
*c         -           (throws an exception -   [NOTE-2]
                       (org.apache.lucene.queryParser.TokenMgrError:)
"c         -           (throws an exception - [NOTE-3]
Hello ""   -           (throws an exception)


[NOTE-1] : - ( ) { } ! [ ] etc characters behave in the same manner 
          as "+" shown above.
[NOTE-2] : Looks like wildcard cannot be the first character of the
           query

[NOTE-3] : I guess this validation can be done after accepting user
           input. 


My Comments/Questions
=====================
Does that mean that the program should taken care of validating the 
User input and then pass the query string to QueryParser?

If yes, I guess there might be some more validations that should be 
Done that I have missed out. Can anyone throw some light on those
Validations that the program should take care?


Code
====

import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;

public class TestQueryParser
{
  public static void main(String[] argv)
  {
    try
    {
      IndexWriter writer = new IndexWriter("indexbbs", new
StandardAnalyzer(), true);
  
      Document d1 = new Document();
      d1.add(Field.Text("f1", "c++ hello bharat"));
      writer.addDocument(d1);
  
      Document d2 = new Document();
      d2.add(Field.Text("f1", "c hello sharat"));
      writer.addDocument(d2);
  
      Document d3 = new Document();
      d3.add(Field.Text("f1", "hello bharat"));
      writer.addDocument(d3);
  
      writer.optimize();
      writer.close();
  
      String qString = "";
      try
      {
        BufferedReader in = new BufferedReader(new
InputStreamReader(System.in));
        System.out.print("Input for f1: ");
        qString = in.readLine();
      }
      catch(Exception e)
      { System.out.println("Exiting..." + e.getMessage()); return; }
  
      System.out.println("");
  
      Searcher searcher = new IndexSearcher("indexbbs");
      Analyzer analyzer = new StandardAnalyzer();

      QueryParser qp = new QueryParser("f1", analyzer);
  
      Query query = qp.parse(qString);
 
      System.out.println("QueryInput:" + qString);
      System.out.println("QueryCreated:" + query.toString("f1"));
  
      Hits hits = searcher.search(query);
      for (int i=0; i<hits.length(); ++i)
        System.out.println("Document = f1:" + hits.doc(i).get("f1"));
    }
    catch(Exception e)
    { System.out.println("Exception:" + e.getMessage());return; } 
  }
}

Thanks 
Bharat

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HELP in QueyParsing !!

Posted by Victor Hadianto <vi...@nuix.com.au>.

> Input:   QueryCreated     Remarks
> c\+\+      c           (Escape character not working)

The StandardTokenizer and QueryParser will drop the ++ sign. This problem is 
similar to the recent thread. Search the archive the the following strings
'-' characer not interpreted correctly in field names

You may be able to implement similar solution to the one that I've posted. 

Actually your query got me interested, I've tried my solution for c-- and the 
-- signs are dropped. This because I define DASHESWORD as 

| <DASHESWORD: <ALPHANUM> ("-" <ALPHANUM>)+ >

This will search for t-shirt, but not tshirt-. Yet another QueryParser 
peculiarity :)

If you absolutely has to search for c++ then I suggest you define another 
token which encompasses all alpharnumeric word and plus sign. For example 
(modify StandardTokenizer.jj):

<MYTOKEN: (<ALPHANUM>|"+")+ >

add the line:

token = <MYTOKEN>

in the next() method. This may work.

> c++        -           (Parser throws an exception) [NOTE-1]
As expected.

> *c         -           (throws an exception -   [NOTE-2]
There has been a number of discussion on this subject, search the mailing list 
for more information. 

> Does that mean that the program should taken care of validating the
> User input and then pass the query string to QueryParser?

Depends how do you look at it. QueryParser will throw ParseException if it has 
parsing issues, you can in some way treat this as the validation.


HTH,
victor


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org