You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Parag Dharmadhikari <pa...@bsil.com> on 2002/01/12 11:02:35 UTC

about parse method

Hi all,

What are the different fields that one can use in Parse method of
QueryParser method. This is because can I use this Parse method to search
the filenames instead of searching the contents of file.

regards
parag


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Trying to create NEAR option in QueryParser -- ideas/help

Posted by ca...@bookandhammer.com.

Thanks for the help Brian,


On Sunday, January 13, 2002, at 05:11 PM, Brian Goetz wrote:

>
>> First I have to say, Brian Goetz has done an awesome job putting this 
>> together.
>
> A great way to get my attention, thanks!  :)
>
>> Overview of how I think the QueryParser works:
>> The basis for the QueryParser is to break up everything into the 
>> appropriate type of Query (TermQuery, PhraseQuery, ...), by the 
>> matching the query pattern. Then to combine these queries into a 
>> collection of BooleanClauses and finally BooleanQueries.
>
> That's the theory.  Unfortunately, we cheat a little bit.
>
>> Now the NEAR option will only work with phrase searches since that is 
>> the only place where you can set the slop factor.
>
> Right.
>
>> So I had two thoughts.
>> 1) Create a new pattern <TERM> "NEAR"<NUM>+ <TERM>
>> The problem I have with this, is that I don't think it will work since 
>> the parser will see <TERM> and never look for the "NEAR" option.
>
> That's one of the real problems with writing two-level parsers with 
> tools like JavaCC -- the tokenizer is completly separate from the 
> parser.  With the current parser, it recognizes the tokens AND and OR 
> and NOT, only in uppercase, because (a) most users type in lower case 
> and/or mixed case, and (b) these are likely to be stop words (or should 
> be) anyway.  But NEAR is more dangerous, since its a useful word, but 
> the upper-case rule might be enough.
>
>> 2) Retroactively make a TermQuery a PhraseQuery with a set slop.
>
> This is a better strategy.
>
> I had been planning on overhauling this myself, but basically that got 
> stalled because I didn't want to take it on until we had a clear 
> statement of what the goal and functionality of the query parser should 
> be and were looking for a syntax for all the various features that made 
> sense.
>
>
>
> --
> Brian Goetz
> Quiotix Corporation
> brian@quiotix.com           Tel: 650-843-1300            Fax: 
> 650-324-8032
>
> http://www.quiotix.com
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Trying to create NEAR option in QueryParser -- ideas/help

Posted by Brian Goetz <br...@quiotix.com>.

>First I have to say, Brian Goetz has done an awesome job putting this 
>together.

A great way to get my attention, thanks!  :)

>Overview of how I think the QueryParser works:
>The basis for the QueryParser is to break up everything into the 
>appropriate type of Query (TermQuery, PhraseQuery, ...), by the matching 
>the query pattern. Then to combine these queries into a collection of 
>BooleanClauses and finally BooleanQueries.

That's the theory.  Unfortunately, we cheat a little bit.

>Now the NEAR option will only work with phrase searches since that is the 
>only place where you can set the slop factor.

Right.

>So I had two thoughts.
>1) Create a new pattern <TERM> "NEAR"<NUM>+ <TERM>
>The problem I have with this, is that I don't think it will work since the 
>parser will see <TERM> and never look for the "NEAR" option.

That's one of the real problems with writing two-level parsers with tools 
like JavaCC -- the tokenizer is completly separate from the parser.  With 
the current parser, it recognizes the tokens AND and OR and NOT, only in 
uppercase, because (a) most users type in lower case and/or mixed case, and 
(b) these are likely to be stop words (or should be) anyway.  But NEAR is 
more dangerous, since its a useful word, but the upper-case rule might be 
enough.

>2) Retroactively make a TermQuery a PhraseQuery with a set slop.

This is a better strategy.

I had been planning on overhauling this myself, but basically that got 
stalled because I didn't want to take it on until we had a clear statement 
of what the goal and functionality of the query parser should be and were 
looking for a syntax for all the various features that made sense.



--
Brian Goetz
Quiotix Corporation
brian@quiotix.com           Tel: 650-843-1300            Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Trying to create NEAR option in QueryParser -- ideas/help

Posted by ca...@bookandhammer.com.

Hi,

I am trying to implement a NEAR option in the QueryParser.jj.

First I have to say, Brian Goetz has done an awesome job putting this 
together.
I rely on it all the time, and it's solid and very complex especially if 
you don't know JavaCC

Now to my question.

Overview of how I think the QueryParser works:
The basis for the QueryParser is to break up everything into the 
appropriate type of Query (TermQuery, PhraseQuery, ...), by the matching 
the query pattern. Then to combine these queries into a collection of 
BooleanClauses and finally BooleanQueries.

Now the NEAR option will only work with phrase searches since that is 
the only place where you can set the slop factor.

So I had two thoughts.
1) Create a new pattern <TERM> "NEAR"<NUM>+ <TERM>
The problem I have with this, is that I don't think it will work since 
the parser will see <TERM> and never look for the "NEAR" option.

2) Retroactively make a TermQuery a PhraseQuery with a set slop.
This is somewhat how the AND conjunction works. I am proposing taking 
the previous BooleanClause and, check to see if the query is  a 
TermQuery. If so then extract the Term and replace the query in the 
BooleanClause with the new PhraseQuery. However, in trying to do this I 
find that I cannot extract the Term from the TermQuery because there is 
no getTerm() from the TermQuery. I don't think it would be difficult to 
add, but I there might be other issues.

Any thoughts or idea will be helpful.

Thanks

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: about parse method

Posted by ca...@bookandhammer.com.

Hi,

I don't fully understand what you are looking to do, but here is one 
idea.
One technique that the Lucene demo uses is to store the path in the 
index (it can be indexed or not).
So, when you create the document, have one of the fields in the document 
be the path. Fields in a Lucene document don't have to be part of a real 
file, they are just fields that you add to be stored, indexed or 
tokenized into the index.

I hope this helps.

--Peter

On Saturday, January 12, 2002, at 02:02 AM, Parag Dharmadhikari wrote:

> Hi all,
>
> What are the different fields that one can use in Parse method of
> QueryParser method. This is because can I use this Parse method to 
> search
> the filenames instead of searching the contents of file.
>
> regards
> parag
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Trying to create NEAR option in QueryParser -- ideas/help

Posted by ca...@bookandhammer.com.

Hi,

I am trying to implement a NEAR option in the QueryParser.jj.

First I have to say, Brian Goetz has done an awesome job putting this 
together.
I rely on it all the time, and it's solid and very complex especially if 
you don't know JavaCC

Now to my question.

Overview of how I think the QueryParser works:
The basis for the QueryParser is to break up everything into the 
appropriate type of Query (TermQuery, PhraseQuery, ...), by the matching 
the query pattern. Then to combine these queries into a collection of 
BooleanClauses and finally BooleanQueries.

Now the NEAR option will only work with phrase searches since that is 
the only place where you can set the slop factor.

So I had two thoughts.
1) Create a new pattern <TERM> "NEAR"<NUM>+ <TERM>
The problem I have with this, is that I don't think it will work since 
the parser will see <TERM> and never look for the "NEAR" option.

2) Retroactively make a TermQuery a PhraseQuery with a set slop.
This is somewhat how the AND conjunction works. I am proposing taking 
the previous BooleanClause and, check to see if the query is  a 
TermQuery. If so then extract the Term and replace the query in the 
BooleanClause with the new PhraseQuery. However, in trying to do this I 
find that I cannot extract the Term from the TermQuery because there is 
no getTerm() from the TermQuery. I don't think it would be difficult to 
add, but I there might be other issues.

Any thoughts or idea will be helpful.

Thanks

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>