You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by spinergywmy <sp...@gmail.com> on 2006/11/20 12:31:33 UTC

how to search string with words

Hi,

   I wonder how I can perform search on string of words within PDF file
contents, for instance,

      I type "third party license" in the search text box, I'm using
QueryParser:

           String searchString = request.getParameter("txtSearch");

           QueryParser parser = new QueryParser("contents", analyzer);

           Query query = parser.parse(searchString);

      From the system.out.println(), I noticed that the query has been
broken like below:

           contents:third contents:party contents:license

      So, I m wondering how can I make the query becomes contents:third
party license? And make the searching more accurate.

   Thanks.

regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/how-to-search-string-with-words-tf2668490.html#a7440978
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to search string with words

Posted by spinergywmy <sp...@gmail.com>.
Hi,

   Thanks Martin. I have one question, what does that slop does within span
near query? What is the difference between 0 and 1? I have seen the source
from Lucene, one of the example putting slop as 4. Could u pls explain that
to me. Thanks.

regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/how-to-search-string-with-words-tf2668490.html#a7471415
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to search string with words

Posted by Martin Braun <mb...@uni-hd.de>.
spinergywmy schrieb:
> Hi Erick,
> 
>    I did take a look at the link that u provided me, and I have try myself
> but I have no return reesult.
> 
>    My search string is "third party license readme"
> 
hhm with a quick look I would suggest that you have to split the string
into individual terms, and then make a spannearquery  for these Terms:

    	String[] que_ary = system_query.split("\\s");
    	//=> Array with third,party,licens,readme
    	SpanQuery[] spanq_ar = new SpanQuery[que_ary.length];
    	
    	for (int i=0; i < que_ary.length; i++) {
    		spanq_ar[i] = new SpanTermQuery( new Term("TI", que_ary[i]) );
    	}
	// now we have an array of spantermquerys

	// each term of the sentence should be in exact order => spannearquery
	//  I am not sure if you'll better do a slop of "0"
    	SpanFirstQuery sfq = new SpanFirstQuery(
    							new SpanNearQuery(spanq_ar,1,true), spanq_ar.length);


hth,
martin

>    Below r the codes that I wrote, please point me out where I have done
> wrong.
> 
>       readerA = IndexReader.open(DsConstant.indexDir);
> 			readerB = IndexReader.open(DsConstant.idxCompDir);
> 			
> 			//building the searchables
> 	        Searcher[] searchers = new Searcher[2];
> 	        
> 	        // VITAL STEP:adding the searcher for the empty index first, before
> the searcher for the populated index
> 	        searchers[0] = new IndexSearcher(readerA);
> 	        searchers[1] = new IndexSearcher(readerB);
> 	        
> 			Analyzer analyzer = new StandardAnalyzer();
> 			QueryParser parser = new QueryParser(DsConstant.idxFileContent,
> analyzer);
> 
>       SpanTermQuery stq = new SpanTermQuery(new Term(field,
> buff.toString())); //field = search base on what I have index
> 				SpanFirstQuery sfq = new SpanFirstQuery(stq, searchString1.length);
> //searchString1 = "third party license readme"
> 				
> 				sfq = (SpanFirstQuery) sfq.rewrite(readerA);
> 				sfq = (SpanFirstQuery) sfq.rewrite(readerB);
> 				
> 				//creating the multiSearcher
> 		        Searcher mSearcher = getMultiSearcherInstance(searchers);
> 				
> 				searchHits = mSearcher.search(sfq);
> 
>    The sysout as below:
> 
>       span first query is ::: spanFirst(TestC:TestC:Third Party License
> Readme, 32)



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to search string with words

Posted by spinergywmy <sp...@gmail.com>.
Hi Erick,

   I did take a look at the link that u provided me, and I have try myself
but I have no return reesult.

   My search string is "third party license readme"

   Below r the codes that I wrote, please point me out where I have done
wrong.

      readerA = IndexReader.open(DsConstant.indexDir);
			readerB = IndexReader.open(DsConstant.idxCompDir);
			
			//building the searchables
	        Searcher[] searchers = new Searcher[2];
	        
	        // VITAL STEP:adding the searcher for the empty index first, before
the searcher for the populated index
	        searchers[0] = new IndexSearcher(readerA);
	        searchers[1] = new IndexSearcher(readerB);
	        
			Analyzer analyzer = new StandardAnalyzer();
			QueryParser parser = new QueryParser(DsConstant.idxFileContent,
analyzer);

      SpanTermQuery stq = new SpanTermQuery(new Term(field,
buff.toString())); //field = search base on what I have index
				SpanFirstQuery sfq = new SpanFirstQuery(stq, searchString1.length);
//searchString1 = "third party license readme"
				
				sfq = (SpanFirstQuery) sfq.rewrite(readerA);
				sfq = (SpanFirstQuery) sfq.rewrite(readerB);
				
				//creating the multiSearcher
		        Searcher mSearcher = getMultiSearcherInstance(searchers);
				
				searchHits = mSearcher.search(sfq);

   The sysout as below:

      span first query is ::: spanFirst(TestC:TestC:Third Party License
Readme, 32)
-- 
View this message in context: http://www.nabble.com/how-to-search-string-with-words-tf2668490.html#a7469644
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to search string with words

Posted by spinergywmy <sp...@gmail.com>.
Hi guys,

   I have this problem searching all fields (metadata) using SpanFirstQuery.

      My scenario is if I just searching on one thing using SpanFirstQuery
is not a problem. However, if I would have to search everything than I will
not have any result return.

      For example, I search based on ALL (id, name, desc, owner, created
by), I put OR in between the query and it looks like id:1 OR name:first
thing OR desc:first thing OR owner:first thing OR createdBy:first thing.
When I sysout I saw the above query listed within my spanNear and that cause
no result. If I just searching on name:first thing then I don't have any
problem.

      Is there any solution for my case?

      Thanks.

regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/how-to-search-string-with-words-tf2668490.html#a7484264
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to search string with words

Posted by Erick Erickson <er...@gmail.com>.
>From a reply by Martin:


Please refer to  the answers to my question on this list:
http://www.nabble.com/forum/ViewPost.jtp?post=7337585&framed=y

Shortly spoken: SpanFirstQuery works like a charm :)

hth,


On 11/20/06, spinergywmy <sp...@gmail.com> wrote:
>
>
> Hi,
>
>    I wonder how I can perform search on string of words within PDF file
> contents, for instance,
>
>       I type "third party license" in the search text box, I'm using
> QueryParser:
>
>            String searchString = request.getParameter("txtSearch");
>
>            QueryParser parser = new QueryParser("contents", analyzer);
>
>            Query query = parser.parse(searchString);
>
>       From the system.out.println(), I noticed that the query has been
> broken like below:
>
>            contents:third contents:party contents:license
>
>       So, I m wondering how can I make the query becomes contents:third
> party license? And make the searching more accurate.
>
>    Thanks.
>
> regards,
> Wooi Meng
> --
> View this message in context:
> http://www.nabble.com/how-to-search-string-with-words-tf2668490.html#a7440978
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>