You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Dennis van der Laan <d....@rug.nl> on 2010/01/11 13:07:33 UTC

Re: Searching for a property

Hello Ard,

We rewrote a part of our virtual path handling, and now store both the
virtual path itself, and the lower-case equivalent (we really need the
not-lowercased path). All queries are now done on the lowercased virtual
path and indeed (!) everything stays fast, even after a million virtual
paths. We'll try to keep away from the lower-case function and similar
functions.

Thanks very much for all your help!

Dennis

Ard Schrijvers wrote:
> On Thu, Dec 17, 2009 at 10:59 PM, Dennis van der Laan
> <d....@rug.nl> wrote:
>   
>> Dennis van der Laan wrote:
>>     
>
>   
>> See the increase of time spent on the execution: 400+ ms instead of 7ms.
>> And this is not a single incident, I see this increase on all queries
>> like the above.
>>
>> The memory of the JVM should not be a problem, it's set to 2Gb and only
>> 800Mb is used at the moment the queries are slow. Restarting the
>> application does not help either.
>>     
>
> No, this seems logical to me. The memory is consumed by internal
> lucene term enums. I am quite sure what your issue is, but did not
> test it, nor ever tried it myself. But, I have always wondered *how*
> the fn:lower-case could have been implemented efficiently in
> Jackrabbit. It doesn't fit into my understanding of how inverted
> indexes work, what Lucene is in the end. So, I am happy that my
> understanding was correct, and unhappy that fn:lower-case does (again,
> from top of my head and looking at code only) not scale to well.
>
> I think in your setup a lot of time is spend in the CaseTermQuery,
> which traverses all your 1 million virtualpaths first and lowercase
> it. This cannot scale (nor in cpu, nor in memory).
>
> So, would you like to give me an indication about the query execution
> time without the fn:lower-case? I think it will drop to < 1 ms.
>
> I think you should try to get away without using the fn:local-name if
> this works for you. Just make sure that you store the virtualpath
> property always as lower-case: then, you are fine
>
>   
>> Again, any help will be appreciated.
>>     
>
> let me know if this helped,
>
> Regards Ard
>
>   
>> Dennis
>>
>>     
>>>> Furthermore, of course, index size matters as well
>>>>
>>>>         


-- 
Dennis van der Laan


Re: Searching for a property

Posted by Dennis van der Laan <d....@rug.nl>.
Hello Ard,

We rewrote a part of our virtual path handling, and now store both the
virtual path itself, and the lower-case equivalent (we really need the
not-lowercased path). All queries are now done on the lowercased virtual
path and indeed (!) everything stays fast, even after a million virtual
paths. We'll try to keep away from the lower-case function and similar
functions.

Thanks very much for all your help!

Dennis

Ard Schrijvers wrote:
> On Mon, Jan 11, 2010 at 1:07 PM, Dennis van der Laan
> <d....@rug.nl> wrote:
>   
>> Hello Ard,
>>
>> We rewrote a part of our virtual path handling, and now store both the
>> virtual path itself, and the lower-case equivalent (we really need the
>> not-lowercased path). All queries are now done on the lowercased virtual
>> path and indeed (!) everything stays fast, even after a million virtual
>> paths. We'll try to keep away from the lower-case function and similar
>> functions.
>>     
>
> as long as it is a single term lookup in Lucene, it is always fast,
> almost regardless the number of terms there are
>
>   
>> Thanks very much for all your help!
>>     
>
> You're welcome,
>
> Ard
>
>   
>> Dennis
>>
>> Ard Schrijvers wrote:
>>     
>>> On Thu, Dec 17, 2009 at 10:59 PM, Dennis van der Laan
>>> <d....@rug.nl> wrote:
>>>
>>>       
>>>> Dennis van der Laan wrote:
>>>>
>>>>         
>>>       
>>>> See the increase of time spent on the execution: 400+ ms instead of 7ms.
>>>> And this is not a single incident, I see this increase on all queries
>>>> like the above.
>>>>
>>>> The memory of the JVM should not be a problem, it's set to 2Gb and only
>>>> 800Mb is used at the moment the queries are slow. Restarting the
>>>> application does not help either.
>>>>
>>>>         
>>> No, this seems logical to me. The memory is consumed by internal
>>> lucene term enums. I am quite sure what your issue is, but did not
>>> test it, nor ever tried it myself. But, I have always wondered *how*
>>> the fn:lower-case could have been implemented efficiently in
>>> Jackrabbit. It doesn't fit into my understanding of how inverted
>>> indexes work, what Lucene is in the end. So, I am happy that my
>>> understanding was correct, and unhappy that fn:lower-case does (again,
>>> from top of my head and looking at code only) not scale to well.
>>>
>>> I think in your setup a lot of time is spend in the CaseTermQuery,
>>> which traverses all your 1 million virtualpaths first and lowercase
>>> it. This cannot scale (nor in cpu, nor in memory).
>>>
>>> So, would you like to give me an indication about the query execution
>>> time without the fn:lower-case? I think it will drop to < 1 ms.
>>>
>>> I think you should try to get away without using the fn:local-name if
>>> this works for you. Just make sure that you store the virtualpath
>>> property always as lower-case: then, you are fine
>>>
>>>
>>>       
>>>> Again, any help will be appreciated.
>>>>
>>>>         
>>> let me know if this helped,
>>>
>>> Regards Ard
>>>
>>>
>>>       
>>>> Dennis
>>>>
>>>>
>>>>         
>>>>>> Furthermore, of course, index size matters as well
>>>>>>
>>>>>>
>>>>>>             
>> --
>> Dennis van der Laan
>>
>>
>>     


-- 
Dennis van der Laan


Re: Searching for a property

Posted by Ard Schrijvers <a....@onehippo.com>.
On Mon, Jan 11, 2010 at 1:07 PM, Dennis van der Laan
<d....@rug.nl> wrote:
> Hello Ard,
>
> We rewrote a part of our virtual path handling, and now store both the
> virtual path itself, and the lower-case equivalent (we really need the
> not-lowercased path). All queries are now done on the lowercased virtual
> path and indeed (!) everything stays fast, even after a million virtual
> paths. We'll try to keep away from the lower-case function and similar
> functions.

as long as it is a single term lookup in Lucene, it is always fast,
almost regardless the number of terms there are

>
> Thanks very much for all your help!

You're welcome,

Ard

>
> Dennis
>
> Ard Schrijvers wrote:
>> On Thu, Dec 17, 2009 at 10:59 PM, Dennis van der Laan
>> <d....@rug.nl> wrote:
>>
>>> Dennis van der Laan wrote:
>>>
>>
>>
>>> See the increase of time spent on the execution: 400+ ms instead of 7ms.
>>> And this is not a single incident, I see this increase on all queries
>>> like the above.
>>>
>>> The memory of the JVM should not be a problem, it's set to 2Gb and only
>>> 800Mb is used at the moment the queries are slow. Restarting the
>>> application does not help either.
>>>
>>
>> No, this seems logical to me. The memory is consumed by internal
>> lucene term enums. I am quite sure what your issue is, but did not
>> test it, nor ever tried it myself. But, I have always wondered *how*
>> the fn:lower-case could have been implemented efficiently in
>> Jackrabbit. It doesn't fit into my understanding of how inverted
>> indexes work, what Lucene is in the end. So, I am happy that my
>> understanding was correct, and unhappy that fn:lower-case does (again,
>> from top of my head and looking at code only) not scale to well.
>>
>> I think in your setup a lot of time is spend in the CaseTermQuery,
>> which traverses all your 1 million virtualpaths first and lowercase
>> it. This cannot scale (nor in cpu, nor in memory).
>>
>> So, would you like to give me an indication about the query execution
>> time without the fn:lower-case? I think it will drop to < 1 ms.
>>
>> I think you should try to get away without using the fn:local-name if
>> this works for you. Just make sure that you store the virtualpath
>> property always as lower-case: then, you are fine
>>
>>
>>> Again, any help will be appreciated.
>>>
>>
>> let me know if this helped,
>>
>> Regards Ard
>>
>>
>>> Dennis
>>>
>>>
>>>>> Furthermore, of course, index size matters as well
>>>>>
>>>>>
>
>
> --
> Dennis van der Laan
>
>