You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Omri Suissa <om...@diffdoof.com> on 2013/05/08 18:34:02 UTC

Lucene query parser multiple words

When using lucene QueryParser to parse a query with multipal words i found
that the returned query is like the following:

*query text:*

word1 word2 word3 word4

*returned query:*

content:word1 content:"word1 word2" content:"word1 word2 word3"
content:"word1 word2 word3 word4"

The logic, as i understand it is: first word, first + second word, first +
second + third word and etc...

This logic sound too simple for my needs. for example if the user search
the following query:

myDoc.txt anotherDoc.txt

the user except to get results with myDoc.txt or with anotherDoc.txt (or
both).

but... since the logic that described the user will get only results for
"myDoc.txt" and "myDoc.txt and anotherDoc.txt" and not about anotherDoc.txt
(second word alone).

*My code:*

new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "content",
analyzer).Parse(strQuery);

to "see" the query i'm using the ToString() method on the query object.

There is a way to change this behavior? (or am i missing something here?)


Thanks,

Omri

Re: Lucene query parser multiple words

Posted by Simon Svensson <si...@devhost.se>.
Sending this to the mailing list in case anyone is keeping track. The 
issue was a custom-modified Lucene.Net assembly, the one provided via 
NuGet works as expected.

On 2013-05-09 17:52, Omri Suissa wrote:
> Hi,
> Sorry, i found the problem, it was in a "fix" I've entered...
>
> Sorry for all the mess...
>
> Omri
>
>
> On Thu, May 9, 2013 at 6:29 PM, Omri Suissa <omri.suissa@diffdoof.com 
> <ma...@diffdoof.com>> wrote:
>
>     I've created a simple winforms app the does the following:
>      private void button1_Click(object sender, EventArgs e)
>             {
>                 string strQuery = textBox1.Text;
>
>                 string query = new
>     QueryParser(Lucene.Net.Util.Version.LUCENE_30, "content",
>                                     new
>     StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30)).
>                                     Parse(strQuery).
>                                     ToString();
>
>                 MessageBox.Show(query);
>             }
>     when entering the query word1 word2 word3 word4 i've got the
>     following result content:word1 content:"word1 word2"
>     content:"word1 word2 word3".
>
>     You can even download my example app from here:
>     http://filesdemo.clearmash.com/diffdoof.installfiles/stackoverflow/WindowsFormsApplication1.zip
>
>     Thanks!
>
>     *Omri Suissa**VP R&D*
>
>     *Tel: +972 9 7724228 **DiffDoof .ltd***
>
>     *Cell:+972 54 5395206**11, Galgaley Haplada Street, *
>
>     *Fax:+972 9 9512577**P.O.Box 2150***
>
>     *www.DiffDoof.com*<http://www.DiffDoof.com>***Herzlia Pituach
>     46120, Israel*
>
>
>
>     On Wed, May 8, 2013 at 7:46 PM, Simon Svensson <sisve@devhost.se
>     <ma...@devhost.se>> wrote:
>
>         Hi,
>
>         This does not match the behavior of QueryParser.
>
>         var analyzer = new StandardAnalyzer(Version.LUCENE_30);
>         var parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
>         var query = parser.Parse("word1 word2 word3 word4");
>
>         Should return a query matching "f:word1 f:word2 f:word3
>         f:word4", i.e. optional match (can be changed with
>         parser.DefaultOperator = ...)
>
>         It sounds like you're using some odd analyzer which produces
>         odd tokens. Could you provide us with a small test case to
>         reproduce your behavior?
>
>         // Simon
>
>
>         On 2013-05-08 18:34, Omri Suissa wrote:
>
>             When using lucene QueryParser to parse a query with
>             multipal words i found
>             that the returned query is like the following:
>
>             *query text:*
>
>             word1 word2 word3 word4
>
>             *returned query:*
>
>
>             content:word1 content:"word1 word2" content:"word1 word2
>             word3"
>             content:"word1 word2 word3 word4"
>
>             The logic, as i understand it is: first word, first +
>             second word, first +
>             second + third word and etc...
>
>             This logic sound too simple for my needs. for example if
>             the user search
>             the following query:
>
>             myDoc.txt anotherDoc.txt
>
>             the user except to get results with myDoc.txt or with
>             anotherDoc.txt (or
>             both).
>
>             but... since the logic that described the user will get
>             only results for
>             "myDoc.txt" and "myDoc.txt and anotherDoc.txt" and not
>             about anotherDoc.txt
>             (second word alone).
>
>             *My code:*
>
>
>             new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "content",
>             analyzer).Parse(strQuery);
>
>             to "see" the query i'm using the ToString() method on the
>             query object.
>
>             There is a way to change this behavior? (or am i missing
>             something here?)
>
>
>             Thanks,
>
>             Omri
>
>
>
>


Re: Lucene query parser multiple words

Posted by Simon Svensson <si...@devhost.se>.
Hi,

This does not match the behavior of QueryParser.

var analyzer = new StandardAnalyzer(Version.LUCENE_30);
var parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
var query = parser.Parse("word1 word2 word3 word4");

Should return a query matching "f:word1 f:word2 f:word3 f:word4", i.e. 
optional match (can be changed with parser.DefaultOperator = ...)

It sounds like you're using some odd analyzer which produces odd tokens. 
Could you provide us with a small test case to reproduce your behavior?

// Simon

On 2013-05-08 18:34, Omri Suissa wrote:
> When using lucene QueryParser to parse a query with multipal words i found
> that the returned query is like the following:
>
> *query text:*
>
> word1 word2 word3 word4
>
> *returned query:*
>
> content:word1 content:"word1 word2" content:"word1 word2 word3"
> content:"word1 word2 word3 word4"
>
> The logic, as i understand it is: first word, first + second word, first +
> second + third word and etc...
>
> This logic sound too simple for my needs. for example if the user search
> the following query:
>
> myDoc.txt anotherDoc.txt
>
> the user except to get results with myDoc.txt or with anotherDoc.txt (or
> both).
>
> but... since the logic that described the user will get only results for
> "myDoc.txt" and "myDoc.txt and anotherDoc.txt" and not about anotherDoc.txt
> (second word alone).
>
> *My code:*
>
> new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "content",
> analyzer).Parse(strQuery);
>
> to "see" the query i'm using the ToString() method on the query object.
>
> There is a way to change this behavior? (or am i missing something here?)
>
>
> Thanks,
>
> Omri
>