You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mirko Sertic <mi...@web.de> on 2014/09/03 21:41:54 UTC

Question regarding complex queries and long tail suggestions

Hi@all

I am using Lucene 4.9 for a search application.

Now i'd like to create complex queries such as:

"hello world * this is interesting"

This should match every dokument with the phrases "hello world" and 
"this is interesting" with any number of terms between them. I also want 
to go further and create a custom query parser to generate queries like 
"hello * world this is * funny * sometimes" and so on. I want to combine 
this syntax with wildcards, for instance "hello wor* this * is funny", 
where wor* can be expanded to world and so on.

The key question is: do i have to construct this with nested queries 
which can become very complex, or would it be easier to use the 
Automaton API and construct an AutomatonQuery from it? What would be 
more performant? I have only read about the Automaton API, so code 
examples would be great!

Later, i want to create long tail suggestions for such kind of queries. 
Unfortunately there is not very much documentation available for this. 
Are there code examples available?

Thanks in advance,
Mirko

Re: Question regarding complex queries and long tail suggestions

Posted by Mirko Sertic <mi...@web.de>.
Hi@all

thanks for the links. In the meantime i used a spanquery with some 
custom syntax highlighting to gather search suggestions. This works 
pretty well, except some performance issues.

I use IndexReader.getTermVector to get the terms for a single document 
to construct my suggestion spans according the the Spans collected from 
SpanQuery.getSpans().

However, this operations seems to be quite slow. Any ideas how to tune 
this? I only need a TermsEnum for the terms matching a span, so i do not 
want to iterate over the whole document.

Thanks in advance,
Mirko

Am 04.09.2014 01:21, schrieb Jack Krupansky:
> Here's the (skimpy) Lucene Javadoc:
> http://lucene.apache.org/core/4_9_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html 
>
>
> Take a look at the unit tests:
> http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_10_0/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java?revision=1622067&view=markup 
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Erick Erickson
> Sent: Wednesday, September 3, 2014 7:14 PM
> To: java-user
> Subject: Re: Question regarding complex queries and long tail suggestions
>
> Take a look at the ComplexPhraseQueryParser here:
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser 
>
>
> Best,
> Erick
>
> On Wed, Sep 3, 2014 at 12:41 PM, Mirko Sertic <mi...@web.de> 
> wrote:
>> Hi@all
>>
>> I am using Lucene 4.9 for a search application.
>>
>> Now i'd like to create complex queries such as:
>>
>> "hello world * this is interesting"
>>
>> This should match every dokument with the phrases "hello world" and 
>> "this is
>> interesting" with any number of terms between them. I also want to go
>> further and create a custom query parser to generate queries like 
>> "hello *
>> world this is * funny * sometimes" and so on. I want to combine this 
>> syntax
>> with wildcards, for instance "hello wor* this * is funny", where wor* 
>> can be
>> expanded to world and so on.
>>
>> The key question is: do i have to construct this with nested queries 
>> which
>> can become very complex, or would it be easier to use the Automaton 
>> API and
>> construct an AutomatonQuery from it? What would be more performant? I 
>> have
>> only read about the Automaton API, so code examples would be great!
>>
>> Later, i want to create long tail suggestions for such kind of queries.
>> Unfortunately there is not very much documentation available for 
>> this. Are
>> there code examples available?
>>
>> Thanks in advance,
>> Mirko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question regarding complex queries and long tail suggestions

Posted by Jack Krupansky <ja...@basetechnology.com>.
Here's the (skimpy) Lucene Javadoc:
http://lucene.apache.org/core/4_9_0/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html

Take a look at the unit tests:
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_10_0/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java?revision=1622067&view=markup

-- Jack Krupansky

-----Original Message----- 
From: Erick Erickson
Sent: Wednesday, September 3, 2014 7:14 PM
To: java-user
Subject: Re: Question regarding complex queries and long tail suggestions

Take a look at the ComplexPhraseQueryParser here:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Best,
Erick

On Wed, Sep 3, 2014 at 12:41 PM, Mirko Sertic <mi...@web.de> wrote:
> Hi@all
>
> I am using Lucene 4.9 for a search application.
>
> Now i'd like to create complex queries such as:
>
> "hello world * this is interesting"
>
> This should match every dokument with the phrases "hello world" and "this 
> is
> interesting" with any number of terms between them. I also want to go
> further and create a custom query parser to generate queries like "hello *
> world this is * funny * sometimes" and so on. I want to combine this 
> syntax
> with wildcards, for instance "hello wor* this * is funny", where wor* can 
> be
> expanded to world and so on.
>
> The key question is: do i have to construct this with nested queries which
> can become very complex, or would it be easier to use the Automaton API 
> and
> construct an AutomatonQuery from it? What would be more performant? I have
> only read about the Automaton API, so code examples would be great!
>
> Later, i want to create long tail suggestions for such kind of queries.
> Unfortunately there is not very much documentation available for this. Are
> there code examples available?
>
> Thanks in advance,
> Mirko

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question regarding complex queries and long tail suggestions

Posted by Erick Erickson <er...@gmail.com>.
Take a look at the ComplexPhraseQueryParser here:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Best,
Erick

On Wed, Sep 3, 2014 at 12:41 PM, Mirko Sertic <mi...@web.de> wrote:
> Hi@all
>
> I am using Lucene 4.9 for a search application.
>
> Now i'd like to create complex queries such as:
>
> "hello world * this is interesting"
>
> This should match every dokument with the phrases "hello world" and "this is
> interesting" with any number of terms between them. I also want to go
> further and create a custom query parser to generate queries like "hello *
> world this is * funny * sometimes" and so on. I want to combine this syntax
> with wildcards, for instance "hello wor* this * is funny", where wor* can be
> expanded to world and so on.
>
> The key question is: do i have to construct this with nested queries which
> can become very complex, or would it be easier to use the Automaton API and
> construct an AutomatonQuery from it? What would be more performant? I have
> only read about the Automaton API, so code examples would be great!
>
> Later, i want to create long tail suggestions for such kind of queries.
> Unfortunately there is not very much documentation available for this. Are
> there code examples available?
>
> Thanks in advance,
> Mirko

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org