You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bharatbhushan_Shetty <Bh...@infosys.com> on 2003/07/14 10:24:39 UTC
HELP in QueyParsing !!
Hi
Need some help in queryparsing. There are few things that doesn't
Seem to work as I expected. Have a look at the code at the end
Before reading my observations.
Document contains following information:
D1 = c++ hello bharat
D2 = c hello sharat
D3 = hello bharat
Observations
============
Input: QueryCreated Remarks
c\+\+ c (Escape character not working)
c++ - (Parser throws an exception) [NOTE-1]
c* c* (Wild card works perfectly fine)
*c - (throws an exception - [NOTE-2]
(org.apache.lucene.queryParser.TokenMgrError:)
"c - (throws an exception - [NOTE-3]
Hello "" - (throws an exception)
[NOTE-1] : - ( ) { } ! [ ] etc characters behave in the same manner
as "+" shown above.
[NOTE-2] : Looks like wildcard cannot be the first character of the
query
[NOTE-3] : I guess this validation can be done after accepting user
input.
My Comments/Questions
=====================
Does that mean that the program should taken care of validating the
User input and then pass the query string to QueryParser?
If yes, I guess there might be some more validations that should be
Done that I have missed out. Can anyone throw some light on those
Validations that the program should take care?
Code
====
import java.io.IOException;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
public class TestQueryParser
{
public static void main(String[] argv)
{
try
{
IndexWriter writer = new IndexWriter("indexbbs", new
StandardAnalyzer(), true);
Document d1 = new Document();
d1.add(Field.Text("f1", "c++ hello bharat"));
writer.addDocument(d1);
Document d2 = new Document();
d2.add(Field.Text("f1", "c hello sharat"));
writer.addDocument(d2);
Document d3 = new Document();
d3.add(Field.Text("f1", "hello bharat"));
writer.addDocument(d3);
writer.optimize();
writer.close();
String qString = "";
try
{
BufferedReader in = new BufferedReader(new
InputStreamReader(System.in));
System.out.print("Input for f1: ");
qString = in.readLine();
}
catch(Exception e)
{ System.out.println("Exiting..." + e.getMessage()); return; }
System.out.println("");
Searcher searcher = new IndexSearcher("indexbbs");
Analyzer analyzer = new StandardAnalyzer();
QueryParser qp = new QueryParser("f1", analyzer);
Query query = qp.parse(qString);
System.out.println("QueryInput:" + qString);
System.out.println("QueryCreated:" + query.toString("f1"));
Hits hits = searcher.search(query);
for (int i=0; i<hits.length(); ++i)
System.out.println("Document = f1:" + hits.doc(i).get("f1"));
}
catch(Exception e)
{ System.out.println("Exception:" + e.getMessage());return; }
}
}
Thanks
Bharat
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: HELP in QueyParsing !!
Posted by Victor Hadianto <vi...@nuix.com.au>.
> Input: QueryCreated Remarks
> c\+\+ c (Escape character not working)
The StandardTokenizer and QueryParser will drop the ++ sign. This problem is
similar to the recent thread. Search the archive the the following strings
'-' characer not interpreted correctly in field names
You may be able to implement similar solution to the one that I've posted.
Actually your query got me interested, I've tried my solution for c-- and the
-- signs are dropped. This because I define DASHESWORD as
| <DASHESWORD: <ALPHANUM> ("-" <ALPHANUM>)+ >
This will search for t-shirt, but not tshirt-. Yet another QueryParser
peculiarity :)
If you absolutely has to search for c++ then I suggest you define another
token which encompasses all alpharnumeric word and plus sign. For example
(modify StandardTokenizer.jj):
<MYTOKEN: (<ALPHANUM>|"+")+ >
add the line:
token = <MYTOKEN>
in the next() method. This may work.
> c++ - (Parser throws an exception) [NOTE-1]
As expected.
> *c - (throws an exception - [NOTE-2]
There has been a number of discussion on this subject, search the mailing list
for more information.
> Does that mean that the program should taken care of validating the
> User input and then pass the query string to QueryParser?
Depends how do you look at it. QueryParser will throw ParseException if it has
parsing issues, you can in some way treat this as the validation.
HTH,
victor
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org