You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Trym Møller <tr...@sigmat.dk> on 2015/01/27 12:28:46 UTC

PostingsFormat block size

Hi

I have successfully create a really cool Lucene41x8PostingsFormat class 
(a copy of the Lucene41PostingsFormat class modified to use 8 times the 
default block size), registered the format as required. In the 
schema.xml I have created a field type string with this postingsformat 
and lastly I'm using this field type for my id field. This all works 
great and as a consequence the .tip files of the Lucene index (segments) 
are considerably smaller and the same goes for the Solr JVM Memory usage 
(which was the end goal).

Now I need to find the consequences (besides the disk and memory usage) 
of this change to the id-field. I would expect that id-searches are 
slower. But when will Solr/Lucene do id-searches? I have myself no user 
scenarios where my documents are searched by the id value.

Thanks for any comments.

Best regards Trym


Re: PostingsFormat block size

Posted by Trym Møller <tr...@sigmat.dk>.
Hi

Thanks for your input.

I do not do updates to the existing docs, so that is not relevant in my 
case, and I have just skipped that test case :-)
I have not been able to measure any significant changes to the 
distributed searches or just doing a direct search for an id.

Did I miss something with your comment "Here it is"?

Best regards Trym

On 27-01-2015 17:22, Mikhail Khludnev wrote:
> Hm.. It's not blocks which I'm familiar with. Regarding performance impact
> from bigger ID blocks: if you have <uniqueKey>ID</uniqueKey> and sends
> update for existing docs. And IDs are also used for some of the distributed
> search stages, I suppose. Here it is.
>
> On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller <tr...@sigmat.dk> wrote:
>
>> Hi
>>
>> Thanks for your clarifying questions.
>>
>> In the constructor of the Lucene41PostingsFormat class the minimum and
>> maximum block size is provided. These sizes are used when creating the
>> BlockTreeTermsWriter (responsible for writing the .tim and .tip files of
>> the lucene index). It is the blocksizes of the BlockTreeTermsWriter I refer
>> to.
>>
>> I'm not quite sure I understand your second question - sorry.
>> I can tell that I have not tried if the PulsingPostingsFormat is of any
>> help in regards to lowering the Solr JVM Memory usage, but I can see the
>> same BlockTreeTermsWriter with its block sizes are used by the
>> PulsingPostingsFormat.
>> Should I expect something else from the PulsingPostingsFormat in regards
>> to memory usage or in regards to searching (if have have changed to block
>> sizes of the BlockTreeTermsWriter)?
>>
>> Best regards Trym
>>
>>
>> On 27-01-2015 14:00, Mikhail Khludnev wrote:
>>
>>> Hello Trym,
>>>
>>> Can you clarify, which blockSize do you mean? And the second q, just to
>>> avoid unnecessary explanation, do you know what's Pulsing?
>>>
>>> On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller <tr...@sigmat.dk> wrote:
>>>
>>>   Hi
>>>> I have successfully create a really cool Lucene41x8PostingsFormat class
>>>> (a
>>>> copy of the Lucene41PostingsFormat class modified to use 8 times the
>>>> default block size), registered the format as required. In the
>>>> schema.xml I
>>>> have created a field type string with this postingsformat and lastly I'm
>>>> using this field type for my id field. This all works great and as a
>>>> consequence the .tip files of the Lucene index (segments) are
>>>> considerably
>>>> smaller and the same goes for the Solr JVM Memory usage (which was the
>>>> end
>>>> goal).
>>>>
>>>> Now I need to find the consequences (besides the disk and memory usage)
>>>> of
>>>> this change to the id-field. I would expect that id-searches are slower.
>>>> But when will Solr/Lucene do id-searches? I have myself no user scenarios
>>>> where my documents are searched by the id value.
>>>>
>>>> Thanks for any comments.
>>>>
>>>> Best regards Trym
>>>>
>>>>
>>>>
>


Re: PostingsFormat block size

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hm.. It's not blocks which I'm familiar with. Regarding performance impact
from bigger ID blocks: if you have <uniqueKey>ID</uniqueKey> and sends
update for existing docs. And IDs are also used for some of the distributed
search stages, I suppose. Here it is.

On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller <tr...@sigmat.dk> wrote:

> Hi
>
> Thanks for your clarifying questions.
>
> In the constructor of the Lucene41PostingsFormat class the minimum and
> maximum block size is provided. These sizes are used when creating the
> BlockTreeTermsWriter (responsible for writing the .tim and .tip files of
> the lucene index). It is the blocksizes of the BlockTreeTermsWriter I refer
> to.
>
> I'm not quite sure I understand your second question - sorry.
> I can tell that I have not tried if the PulsingPostingsFormat is of any
> help in regards to lowering the Solr JVM Memory usage, but I can see the
> same BlockTreeTermsWriter with its block sizes are used by the
> PulsingPostingsFormat.
> Should I expect something else from the PulsingPostingsFormat in regards
> to memory usage or in regards to searching (if have have changed to block
> sizes of the BlockTreeTermsWriter)?
>
> Best regards Trym
>
>
> On 27-01-2015 14:00, Mikhail Khludnev wrote:
>
>> Hello Trym,
>>
>> Can you clarify, which blockSize do you mean? And the second q, just to
>> avoid unnecessary explanation, do you know what's Pulsing?
>>
>> On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller <tr...@sigmat.dk> wrote:
>>
>>  Hi
>>>
>>> I have successfully create a really cool Lucene41x8PostingsFormat class
>>> (a
>>> copy of the Lucene41PostingsFormat class modified to use 8 times the
>>> default block size), registered the format as required. In the
>>> schema.xml I
>>> have created a field type string with this postingsformat and lastly I'm
>>> using this field type for my id field. This all works great and as a
>>> consequence the .tip files of the Lucene index (segments) are
>>> considerably
>>> smaller and the same goes for the Solr JVM Memory usage (which was the
>>> end
>>> goal).
>>>
>>> Now I need to find the consequences (besides the disk and memory usage)
>>> of
>>> this change to the id-field. I would expect that id-searches are slower.
>>> But when will Solr/Lucene do id-searches? I have myself no user scenarios
>>> where my documents are searched by the id value.
>>>
>>> Thanks for any comments.
>>>
>>> Best regards Trym
>>>
>>>
>>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: PostingsFormat block size

Posted by Trym Møller <tr...@sigmat.dk>.
Hi

Thanks for your clarifying questions.

In the constructor of the Lucene41PostingsFormat class the minimum and 
maximum block size is provided. These sizes are used when creating the 
BlockTreeTermsWriter (responsible for writing the .tim and .tip files of 
the lucene index). It is the blocksizes of the BlockTreeTermsWriter I 
refer to.

I'm not quite sure I understand your second question - sorry.
I can tell that I have not tried if the PulsingPostingsFormat is of any 
help in regards to lowering the Solr JVM Memory usage, but I can see the 
same BlockTreeTermsWriter with its block sizes are used by the 
PulsingPostingsFormat.
Should I expect something else from the PulsingPostingsFormat in regards 
to memory usage or in regards to searching (if have have changed to 
block sizes of the BlockTreeTermsWriter)?

Best regards Trym

On 27-01-2015 14:00, Mikhail Khludnev wrote:
> Hello Trym,
>
> Can you clarify, which blockSize do you mean? And the second q, just to
> avoid unnecessary explanation, do you know what's Pulsing?
>
> On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller <tr...@sigmat.dk> wrote:
>
>> Hi
>>
>> I have successfully create a really cool Lucene41x8PostingsFormat class (a
>> copy of the Lucene41PostingsFormat class modified to use 8 times the
>> default block size), registered the format as required. In the schema.xml I
>> have created a field type string with this postingsformat and lastly I'm
>> using this field type for my id field. This all works great and as a
>> consequence the .tip files of the Lucene index (segments) are considerably
>> smaller and the same goes for the Solr JVM Memory usage (which was the end
>> goal).
>>
>> Now I need to find the consequences (besides the disk and memory usage) of
>> this change to the id-field. I would expect that id-searches are slower.
>> But when will Solr/Lucene do id-searches? I have myself no user scenarios
>> where my documents are searched by the id value.
>>
>> Thanks for any comments.
>>
>> Best regards Trym
>>
>>
>


Re: PostingsFormat block size

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Trym,

Can you clarify, which blockSize do you mean? And the second q, just to
avoid unnecessary explanation, do you know what's Pulsing?

On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller <tr...@sigmat.dk> wrote:

> Hi
>
> I have successfully create a really cool Lucene41x8PostingsFormat class (a
> copy of the Lucene41PostingsFormat class modified to use 8 times the
> default block size), registered the format as required. In the schema.xml I
> have created a field type string with this postingsformat and lastly I'm
> using this field type for my id field. This all works great and as a
> consequence the .tip files of the Lucene index (segments) are considerably
> smaller and the same goes for the Solr JVM Memory usage (which was the end
> goal).
>
> Now I need to find the consequences (besides the disk and memory usage) of
> this change to the id-field. I would expect that id-searches are slower.
> But when will Solr/Lucene do id-searches? I have myself no user scenarios
> where my documents are searched by the id value.
>
> Thanks for any comments.
>
> Best regards Trym
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>