You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "G.Long" <jd...@gmail.com> on 2012/02/13 17:38:42 UTC

query performance with leading *

Hi,

Is there a way to improve query performance when using a leading * as a 
wildcard on a path property?

I have hundreds of queries to run on a lucene index (~250mo). Executing 
those queries without the leading * is about 5x faster than with the 
leading *. My problem is that I sometimes need to use the leading *.

Most of the queries have the full path as parameter but some of them 
have only a part of it.

The queries look like:
"+projet:CCOM +path:*/folder5/folder6/folder_ab/

I'm using lucene 3.1.0

Regards,
Gary


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query performance with leading *

Posted by "G.Long" <jd...@gmail.com>.
Thank you for the tips,

Is there an analyzer which uses this tokenizer? If not, do you know any 
tutorial which explain how to implement a custom analyzer? I didn't find 
any.

Regards.

Le 13/02/2012 17:46, Robert Muir a écrit :
> I think you can solve this with the tokenizers in the
> org.apache.lucene.analysis.path package (in lucene-analyzers.jar)
>
> In your case, looks like ReversePathHierarchyTokenizer might be what
> you want, though you will need to upgrade to at least 3.2 to get it.
>
> On Mon, Feb 13, 2012 at 11:38 AM, G.Long<jd...@gmail.com>  wrote:
>> Hi,
>>
>> Is there a way to improve query performance when using a leading * as a
>> wildcard on a path property?
>>
>> I have hundreds of queries to run on a lucene index (~250mo). Executing
>> those queries without the leading * is about 5x faster than with the leading
>> *. My problem is that I sometimes need to use the leading *.
>>
>> Most of the queries have the full path as parameter but some of them have
>> only a part of it.
>>
>> The queries look like:
>> "+projet:CCOM +path:*/folder5/folder6/folder_ab/
>>
>> I'm using lucene 3.1.0
>>
>> Regards,
>> Gary
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query performance with leading *

Posted by Robert Muir <rc...@gmail.com>.
I think you can solve this with the tokenizers in the
org.apache.lucene.analysis.path package (in lucene-analyzers.jar)

In your case, looks like ReversePathHierarchyTokenizer might be what
you want, though you will need to upgrade to at least 3.2 to get it.

On Mon, Feb 13, 2012 at 11:38 AM, G.Long <jd...@gmail.com> wrote:
> Hi,
>
> Is there a way to improve query performance when using a leading * as a
> wildcard on a path property?
>
> I have hundreds of queries to run on a lucene index (~250mo). Executing
> those queries without the leading * is about 5x faster than with the leading
> *. My problem is that I sometimes need to use the leading *.
>
> Most of the queries have the full path as parameter but some of them have
> only a part of it.
>
> The queries look like:
> "+projet:CCOM +path:*/folder5/folder6/folder_ab/
>
> I'm using lucene 3.1.0
>
> Regards,
> Gary
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: query performance with leading *

Posted by "Austin, Carl" <Ca...@baesystemsdetica.com>.
You could possibly tokenize the value both forwards and in reverse, for
example:

123456 and 654321

You can then convert a query for *56 to 65* and this will increase
performance.

-----Original Message-----
From: G.Long [mailto:jdevgl@gmail.com] 
Sent: 13 February 2012 16:39
To: java-user@lucene.apache.org
Subject: query performance with leading *

Hi,

Is there a way to improve query performance when using a leading * as a 
wildcard on a path property?

I have hundreds of queries to run on a lucene index (~250mo). Executing 
those queries without the leading * is about 5x faster than with the 
leading *. My problem is that I sometimes need to use the leading *.

Most of the queries have the full path as parameter but some of them 
have only a part of it.

The queries look like:
"+projet:CCOM +path:*/folder5/folder6/folder_ab/

I'm using lucene 3.1.0

Regards,
Gary


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Please consider the environment before printing this email.
 
This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
 
Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. 
 
The contents of this email may relate to dealings with other companies under the control of BAE Systems plc details of which can be found at http://www.baesystems.com/Businesses/index.htm.
 
Detica Limited is a BAE Systems company trading as BAE Systems Detica.
Detica Limited is registered in England and Wales under No: 1337451.
Registered office: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org