You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Dmitriy Pavlov <dp...@gmail.com> on 2018/10/08 13:18:54 UTC

Mark dirty for DataPage: small changes in huge objects.

Hi Igniters,

I'd like to share a case which was implemented in the previous version of
TC Bot. It is a kind of REST responses cache <RestParms, Response>:
Response {
  Long tsRefreshed; // timestamp of the last call to real service
  List<Build> builds; // a huge list of builds, most times it is not
changed.
}

And it seems timestamp (ts) offset in all entries pages is constant and it
requires 8 bytes. Data in builds storage will require a number of pages in
the durable memory, probably >10-20 pages.

So if REST (real service) responds with the same builds content only TS is
updated. After that, I did cache.put(restParms, reponse).

So my question is, will such update, which affects only 1 field causes mark
dirty for 1 page or for 20? I feel according to checkpoints amount that we
mark all pages as dirty even if the content is not modified. If so, I would
like to suggest a slight change to Ignite: for data pages mark as only that
pages, which has a modification in its content.

I understand that previous implementation in the Bot was quite naive (now
it is changed), but still, what if we will check for modifications by
mem-compare before marking? Mark dirty now seems to cause additional data
to be flushed to disk on next checkpoint.

I would appreciate if Native Persistence Experts can help me to find a
place in the code, where such updates are performed? (Maybe I miss
something).

Sincerely,
Dmitriy Pavlov

Re: Mark dirty for DataPage: small changes in huge objects.

Posted by Ivan Rakov <iv...@gmail.com>.
I agree that such option is hard to explain and will complicate data 
storage tuning (which is already not simple).
The problem is that we don't divide pages to overflow/non-overflow so 
far. We need to see benchmark results first - there's a chance that 
negative effects will be insignificant and option won't be needed at 
all. Otherwise, we may come up with a heuristic that will minimize 
negative effect, e.g. apply bytewise comparison only for data pages with 
only one payload item.

Best Regards,
Ivan Rakov

On 08.10.2018 18:25, Vladimir Ozerov wrote:
> Can we use this mode for overflow pages and do not use for normal entries
> which fir a single page?
> In general users try to avoid fine-grained tuning options, as they are very
> complex to understand. We should try to avoid any new configuration options.
>
> On Mon, Oct 8, 2018 at 5:51 PM Ivan Rakov <iv...@gmail.com> wrote:
>
>> Huge +1.
>>
>> Page dirty flag is set in PageMemoryImpl#writeUnlockPage body. Caller
>> passes "markDirty=true" boolean flag if he assumes that page content may
>> have changed (dirty flag will be set even if page content remained
>> intact). Instead of this, we can dump page content to thread-local
>> buffer after successful write lock and compare it bytewise with new
>> content on write unlock.
>>
>> I believe, this logic should be introduced as a separate data storage
>> mode as it have both positive and negative effects.
>>
>> Positive:
>> Small updates of large entries will produce much less dirty pages. It
>> can dramatically boost performance of updates - especially when SQL
>> update of single field is performed over large objects.
>>
>> Negative:
>> CPU consumption and latency will be increased. We'll need some time to
>> copy and compare page content. Anyway, lack of disk IOPS hits us much
>> more often than lack of CPU - benchmarks will show whether such impact
>> will be perceptible.
>>
>> Let's file a ticket for this task unless there are any objections.
>>
>> Best Regards,
>> Ivan Rakov
>>
>> On 08.10.2018 16:18, Dmitriy Pavlov wrote:
>>> Hi Igniters,
>>>
>>> I'd like to share a case which was implemented in the previous version of
>>> TC Bot. It is a kind of REST responses cache <RestParms, Response>:
>>> Response {
>>>     Long tsRefreshed; // timestamp of the last call to real service
>>>     List<Build> builds; // a huge list of builds, most times it is not
>>> changed.
>>> }
>>>
>>> And it seems timestamp (ts) offset in all entries pages is constant and
>> it
>>> requires 8 bytes. Data in builds storage will require a number of pages
>> in
>>> the durable memory, probably >10-20 pages.
>>>
>>> So if REST (real service) responds with the same builds content only TS
>> is
>>> updated. After that, I did cache.put(restParms, reponse).
>>>
>>> So my question is, will such update, which affects only 1 field causes
>> mark
>>> dirty for 1 page or for 20? I feel according to checkpoints amount that
>> we
>>> mark all pages as dirty even if the content is not modified. If so, I
>> would
>>> like to suggest a slight change to Ignite: for data pages mark as only
>> that
>>> pages, which has a modification in its content.
>>>
>>> I understand that previous implementation in the Bot was quite naive (now
>>> it is changed), but still, what if we will check for modifications by
>>> mem-compare before marking? Mark dirty now seems to cause additional data
>>> to be flushed to disk on next checkpoint.
>>>
>>> I would appreciate if Native Persistence Experts can help me to find a
>>> place in the code, where such updates are performed? (Maybe I miss
>>> something).
>>>
>>> Sincerely,
>>> Dmitriy Pavlov
>>>
>>


Re: Mark dirty for DataPage: small changes in huge objects.

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Can we use this mode for overflow pages and do not use for normal entries
which fir a single page?
In general users try to avoid fine-grained tuning options, as they are very
complex to understand. We should try to avoid any new configuration options.

On Mon, Oct 8, 2018 at 5:51 PM Ivan Rakov <iv...@gmail.com> wrote:

> Huge +1.
>
> Page dirty flag is set in PageMemoryImpl#writeUnlockPage body. Caller
> passes "markDirty=true" boolean flag if he assumes that page content may
> have changed (dirty flag will be set even if page content remained
> intact). Instead of this, we can dump page content to thread-local
> buffer after successful write lock and compare it bytewise with new
> content on write unlock.
>
> I believe, this logic should be introduced as a separate data storage
> mode as it have both positive and negative effects.
>
> Positive:
> Small updates of large entries will produce much less dirty pages. It
> can dramatically boost performance of updates - especially when SQL
> update of single field is performed over large objects.
>
> Negative:
> CPU consumption and latency will be increased. We'll need some time to
> copy and compare page content. Anyway, lack of disk IOPS hits us much
> more often than lack of CPU - benchmarks will show whether such impact
> will be perceptible.
>
> Let's file a ticket for this task unless there are any objections.
>
> Best Regards,
> Ivan Rakov
>
> On 08.10.2018 16:18, Dmitriy Pavlov wrote:
> > Hi Igniters,
> >
> > I'd like to share a case which was implemented in the previous version of
> > TC Bot. It is a kind of REST responses cache <RestParms, Response>:
> > Response {
> >    Long tsRefreshed; // timestamp of the last call to real service
> >    List<Build> builds; // a huge list of builds, most times it is not
> > changed.
> > }
> >
> > And it seems timestamp (ts) offset in all entries pages is constant and
> it
> > requires 8 bytes. Data in builds storage will require a number of pages
> in
> > the durable memory, probably >10-20 pages.
> >
> > So if REST (real service) responds with the same builds content only TS
> is
> > updated. After that, I did cache.put(restParms, reponse).
> >
> > So my question is, will such update, which affects only 1 field causes
> mark
> > dirty for 1 page or for 20? I feel according to checkpoints amount that
> we
> > mark all pages as dirty even if the content is not modified. If so, I
> would
> > like to suggest a slight change to Ignite: for data pages mark as only
> that
> > pages, which has a modification in its content.
> >
> > I understand that previous implementation in the Bot was quite naive (now
> > it is changed), but still, what if we will check for modifications by
> > mem-compare before marking? Mark dirty now seems to cause additional data
> > to be flushed to disk on next checkpoint.
> >
> > I would appreciate if Native Persistence Experts can help me to find a
> > place in the code, where such updates are performed? (Maybe I miss
> > something).
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
>
>

Re: Mark dirty for DataPage: small changes in huge objects.

Posted by Ivan Rakov <iv...@gmail.com>.
Huge +1.

Page dirty flag is set in PageMemoryImpl#writeUnlockPage body. Caller 
passes "markDirty=true" boolean flag if he assumes that page content may 
have changed (dirty flag will be set even if page content remained 
intact). Instead of this, we can dump page content to thread-local 
buffer after successful write lock and compare it bytewise with new 
content on write unlock.

I believe, this logic should be introduced as a separate data storage 
mode as it have both positive and negative effects.

Positive:
Small updates of large entries will produce much less dirty pages. It 
can dramatically boost performance of updates - especially when SQL 
update of single field is performed over large objects.

Negative:
CPU consumption and latency will be increased. We'll need some time to 
copy and compare page content. Anyway, lack of disk IOPS hits us much 
more often than lack of CPU - benchmarks will show whether such impact 
will be perceptible.

Let's file a ticket for this task unless there are any objections.

Best Regards,
Ivan Rakov

On 08.10.2018 16:18, Dmitriy Pavlov wrote:
> Hi Igniters,
>
> I'd like to share a case which was implemented in the previous version of
> TC Bot. It is a kind of REST responses cache <RestParms, Response>:
> Response {
>    Long tsRefreshed; // timestamp of the last call to real service
>    List<Build> builds; // a huge list of builds, most times it is not
> changed.
> }
>
> And it seems timestamp (ts) offset in all entries pages is constant and it
> requires 8 bytes. Data in builds storage will require a number of pages in
> the durable memory, probably >10-20 pages.
>
> So if REST (real service) responds with the same builds content only TS is
> updated. After that, I did cache.put(restParms, reponse).
>
> So my question is, will such update, which affects only 1 field causes mark
> dirty for 1 page or for 20? I feel according to checkpoints amount that we
> mark all pages as dirty even if the content is not modified. If so, I would
> like to suggest a slight change to Ignite: for data pages mark as only that
> pages, which has a modification in its content.
>
> I understand that previous implementation in the Bot was quite naive (now
> it is changed), but still, what if we will check for modifications by
> mem-compare before marking? Mark dirty now seems to cause additional data
> to be flushed to disk on next checkpoint.
>
> I would appreciate if Native Persistence Experts can help me to find a
> place in the code, where such updates are performed? (Maybe I miss
> something).
>
> Sincerely,
> Dmitriy Pavlov
>