You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Francesco Mari <ma...@gmail.com> on 2016/07/22 09:32:38 UTC

Are dumb segments dumb?

Hi,

Yesterday I took some time for a little experiment: how many
optimisations can be removed from the current segment format while
maintaining the same functionality?

I made some work in a branch on GitHub [1]. The code on that branch is
similar to the current trunk except for the following changes:

1. Record IDs are always serialised in their entirety. As such, a
serialised record ID occupies 18 bytes instead of 3.

2. Because of the previous change, the table of referenced segment IDs
is not needed anymore, so I removed it from the segment header. It
turns out that this table is indeed needed for the mark phase of
compaction, so this feature is broken in that branch.

Anyway, since the code is in a runnable state, I generated some
content using the current trunk and the dumber version of
oak-segment-tar. This is the repository created by the dumb
oak-segment-tar:

524744 data00000a.tar
524584 data00001a.tar
524688 data00002a.tar
460896 data00003a.tar
8 journal.log
0 repo.lock

This is the one created by the current trunk:

524864 data00000a.tar
524656 data00001a.tar
524792 data00002a.tar
297288 data00003a.tar
8 journal.log
0 repo.lock

The process that generates the content doesn't change between the two
executions, and the generated content is coming from a real world
scenario. For those familiar with it, the content is generated by an
installation of Adobe Experience Manager.

It looks like that the size of the repository is not changing so much.
Probably the de-optimisation in the small is dwarfed by the binary
content in the large. Another effect of my change is that there is no
limit on the number of referenced segment IDs per segment, and this
might allow segments to pack more records than before.

Questions apart, the clear advantage of this change is a great
simplification of the code. I guess I can remove some lines more, but
what I peeled off is already a considerable amount. Look at the code!

Francesco

[1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb

Re: Are dumb segments dumb?

Posted by Michael Dürig <md...@apache.org>.

Thanks Francesco for putting this together. I have yet to look at the 
patch and will comment on the issue re. technicalities.

I'm sure logical record ids will open up new ways to further improve 
segment store gc. They can be seen as a more sophisticated solution to 
the "equality" problem I described in OAK-3348 [1]: "the compaction map 
is serving three different concerns: deduplication, equality (of node 
states across different GC generations) and tracking GC generations."

Michael

[1] 
https://issues.apache.org/jira/browse/OAK-3348?focusedCommentId=15190912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15190912

On 10.8.16 12:19 , Francesco Mari wrote:
> While the testing effort on dumb segments is ongoing, I opened
> OAK-4659 and attached a patch to it. This change is based on the dumb
> segments, and improves the format by implementing logic record IDs.
> This way, records can by addressed by a record number instead of using
> their offsets inside the segment.
>
> 2016-07-27 17:06 GMT+02:00 Michael D�rig <md...@apache.org>:
>>
>> Looks good! I think we should give this one a spin. Some minor points we
>> should keep an eye on before we commit this though:
>>
>> - does tooling still work with the changes in the segment format? Some of
>> them access the segments directly such that expanding the segment header by
>> 2 bytes might break them.
>>
>> - have a look at the micro benchmarks and compare to before.
>>
>> - remind us to remember ;-) updating the documentation of the segment format
>> at some point
>>
>> - I would like to have something along the lines of the segment size test
>> back. Probably not as a unit test but more as a benchmark for record sizes.
>> So instead of it failing the build, it would output some numbers which we
>> could then graph very much the same way like for performance benchmarks.
>>
>> Michael
>>
>>
>>
>> On 26.7.16 11:47 , Francesco Mari wrote:
>>>
>>> With my latest commits on this branch [1] I enabled every previously
>>> ignored test, fixing them when needed., The only two exceptions are
>>> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
>>> I also added a couple of tests to cover the cases that work slightly
>>> differently than before.
>>>
>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>
>>> 2016-07-25 17:48 GMT+02:00 Francesco Mari <ma...@gmail.com>:
>>>>
>>>> It might be a variation in the process I tried. This shouldn't affect
>>>> much the statistics anyway, given that the population sample is big
>>>> enough in both cases.
>>>>
>>>> 2016-07-25 17:46 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>
>>>>>
>>>>> Interesting numbers. Most of them look as I would have expected. I.e.
>>>>> the
>>>>> distributions in the dumb case are more regular (smaller std. dev, mean
>>>>> and
>>>>> median closer to each other), bigger segment sizes, etc.
>>>>>
>>>>> What I don't understand is the total number of records. These numbers
>>>>> differ
>>>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>>>> reproducible) or are we missing out on something.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>>>
>>>>>>
>>>>>> I put together some statistics [1] for the process I described above.
>>>>>> The "dumb" variant requires more segments to store the same amount of
>>>>>> data, because of the increased size of serialised record IDs.  As you
>>>>>> can see the amount of records per segment is definitely lower in the
>>>>>> dumb variant.
>>>>>>
>>>>>> On the other hand, ignoring the growth of segment ID reference table
>>>>>> seems to be a good choice. As shown from the segment size average,
>>>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>>>> lower standard deviation shows that it's more common to have full dumb
>>>>>> segments.
>>>>>>
>>>>>> In addition, my analysis seems to have found a bug too. There are a
>>>>>> lot of segments with no segment ID references and only one record,
>>>>>> which is very likely to be the segment info. The flush thread writes
>>>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>>>> it always contains at least one record. As such, we are currently
>>>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>>>> space on disk because of the padding required by the TAR format.
>>>>>>
>>>>>> [1]:
>>>>>>
>>>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>>>
>>>>>> 2016-07-25 10:05 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Jukka,
>>>>>>>
>>>>>>> Thanks for sharing your perspective and the historical background.
>>>>>>>
>>>>>>> I agree that repository size shouldn't be a primary concern. However,
>>>>>>> we
>>>>>>> have seen many repositories (especially with an external data store)
>>>>>>> where
>>>>>>> the content is extremely fine granular. Much more than in an initial
>>>>>>> content
>>>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>>>> collecting statistics). So we should at least understand the impact of
>>>>>>> the
>>>>>>> patch in various scenarios.
>>>>>>>
>>>>>>> My main concern is the cache footprint of node records. Those are made
>>>>>>> up
>>>>>>> of
>>>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>>>> current
>>>>>>> patch.
>>>>>>>
>>>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>>>> improve as the patch gets rid of the 255 references limit of segments.
>>>>>>> A
>>>>>>> limit which in practical deployments leads to degeneration of segment
>>>>>>> sizes
>>>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some
>>>>>>> background
>>>>>>> on
>>>>>>> this.
>>>>>>> Furthermore we already did a big step forward in improving locality in
>>>>>>> concurrent write scenarios when we introduced the
>>>>>>> SegmentBufferWriterPool.
>>>>>>> In essence: thread affinity for segments.
>>>>>>>
>>>>>>> We should probably be more carefully looking at the micro benchmarks.
>>>>>>> I
>>>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>>>> only
>>>>>>> tell you so much. Many of the problems we recently faced only surfaced
>>>>>>> in
>>>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Cool! I'm pretty sure there are various ways in which the format
>>>>>>>> could
>>>>>>>> be
>>>>>>>> improved, as the original design was based mostly on intuition,
>>>>>>>> guided
>>>>>>>> somewhat by collected stats
>>>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>>>> and
>>>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>>>> used
>>>>>>>> to optimize common operations.
>>>>>>>>
>>>>>>>> Note though that the total size of the repository was not and
>>>>>>>> probably
>>>>>>>> shouldn't be a primary metric, since the size of a typical repository
>>>>>>>> is
>>>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>>>> idea
>>>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>>>> Instead
>>>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>>>> possible is captured in the principles listed in the original design
>>>>>>>> doc
>>>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>>>
>>>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>>>> reduce
>>>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>>>> stored in
>>>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>>>> would as
>>>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>>>    - Locality. Segments are written so that related records, like a
>>>>>>>> node
>>>>>>>>    and its immediate children, usually end up stored in the same
>>>>>>>> segment.
>>>>>>>> This
>>>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>>>> typical
>>>>>>>>    clients that access more than one related node per session.
>>>>>>>>
>>>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>>>> addition
>>>>>>>> to raw repository size when evaluating possible improvements. Also,
>>>>>>>> the
>>>>>>>> number and size of data segments are good size metrics to look at in
>>>>>>>> addition to total disk usage.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>>
>>>>>>>> Jukka Zitting
>>>>>>>>
>>>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>>>> <ma...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The impact on repository size needs to be assessed with more
>>>>>>>>> specific
>>>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>>>> every time a small fix changes the size of the records.
>>>>>>>>>
>>>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>>>> phase. Of course, it's handy to have this information pre-computed
>>>>>>>>> for
>>>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>>>
>>>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>>>> simplification exercise can be worth the effort.
>>>>>>>>>
>>>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>>>> segment
>>>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ratio. I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>>>> repositories of different structures and see how such a number
>>>>>>>>>> differs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and without the patch.
>>>>>>>>>>
>>>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time
>>>>>>>>>> reducing
>>>>>>>>>> complexity a lot.
>>>>>>>>>>
>>>>>>>>>> OTOH we need to figure out how to regain the lost functionality
>>>>>>>>>> (e.g.
>>>>>>>>>> gc)
>>>>>>>>>> and asses its impact on repository size.
>>>>>>>>>>
>>>>>>>>>> Michael
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>>>> maintaining the same functionality?
>>>>>>>>>>>
>>>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that
>>>>>>>>>>> branch
>>>>>>>>>>> is
>>>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>>>
>>>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>>>
>>>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>>>> IDs
>>>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>>>> oak-segment-tar:
>>>>>>>>>>>
>>>>>>>>>>> 524744 data00000a.tar
>>>>>>>>>>> 524584 data00001a.tar
>>>>>>>>>>> 524688 data00002a.tar
>>>>>>>>>>> 460896 data00003a.tar
>>>>>>>>>>> 8 journal.log
>>>>>>>>>>> 0 repo.lock
>>>>>>>>>>>
>>>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>>>
>>>>>>>>>>> 524864 data00000a.tar
>>>>>>>>>>> 524656 data00001a.tar
>>>>>>>>>>> 524792 data00002a.tar
>>>>>>>>>>> 297288 data00003a.tar
>>>>>>>>>>> 8 journal.log
>>>>>>>>>>> 0 repo.lock
>>>>>>>>>>>
>>>>>>>>>>> The process that generates the content doesn't change between the
>>>>>>>>>>> two
>>>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>>>> scenario. For those familiar with it, the content is generated by
>>>>>>>>>>> an
>>>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>>>
>>>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>>>> much.
>>>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>>>> content in the large. Another effect of my change is that there is
>>>>>>>>>>> no
>>>>>>>>>>> limit on the number of referenced segment IDs per segment, and
>>>>>>>>>>> this
>>>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>>>
>>>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>>>> simplification of the code. I guess I can remove some lines more,
>>>>>>>>>>> but
>>>>>>>>>>> what I peeled off is already a considerable amount. Look at the
>>>>>>>>>>> code!
>>>>>>>>>>>
>>>>>>>>>>> Francesco
>>>>>>>>>>>
>>>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

While the testing effort on dumb segments is ongoing, I opened
OAK-4659 and attached a patch to it. This change is based on the dumb
segments, and improves the format by implementing logic record IDs.
This way, records can by addressed by a record number instead of using
their offsets inside the segment.

2016-07-27 17:06 GMT+02:00 Michael Dürig <md...@apache.org>:
>
> Looks good! I think we should give this one a spin. Some minor points we
> should keep an eye on before we commit this though:
>
> - does tooling still work with the changes in the segment format? Some of
> them access the segments directly such that expanding the segment header by
> 2 bytes might break them.
>
> - have a look at the micro benchmarks and compare to before.
>
> - remind us to remember ;-) updating the documentation of the segment format
> at some point
>
> - I would like to have something along the lines of the segment size test
> back. Probably not as a unit test but more as a benchmark for record sizes.
> So instead of it failing the build, it would output some numbers which we
> could then graph very much the same way like for performance benchmarks.
>
> Michael
>
>
>
> On 26.7.16 11:47 , Francesco Mari wrote:
>>
>> With my latest commits on this branch [1] I enabled every previously
>> ignored test, fixing them when needed., The only two exceptions are
>> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
>> I also added a couple of tests to cover the cases that work slightly
>> differently than before.
>>
>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>
>> 2016-07-25 17:48 GMT+02:00 Francesco Mari <ma...@gmail.com>:
>>>
>>> It might be a variation in the process I tried. This shouldn't affect
>>> much the statistics anyway, given that the population sample is big
>>> enough in both cases.
>>>
>>> 2016-07-25 17:46 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>
>>>>
>>>> Interesting numbers. Most of them look as I would have expected. I.e.
>>>> the
>>>> distributions in the dumb case are more regular (smaller std. dev, mean
>>>> and
>>>> median closer to each other), bigger segment sizes, etc.
>>>>
>>>> What I don't understand is the total number of records. These numbers
>>>> differ
>>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>>> reproducible) or are we missing out on something.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>>
>>>>>
>>>>> I put together some statistics [1] for the process I described above.
>>>>> The "dumb" variant requires more segments to store the same amount of
>>>>> data, because of the increased size of serialised record IDs.  As you
>>>>> can see the amount of records per segment is definitely lower in the
>>>>> dumb variant.
>>>>>
>>>>> On the other hand, ignoring the growth of segment ID reference table
>>>>> seems to be a good choice. As shown from the segment size average,
>>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>>> lower standard deviation shows that it's more common to have full dumb
>>>>> segments.
>>>>>
>>>>> In addition, my analysis seems to have found a bug too. There are a
>>>>> lot of segments with no segment ID references and only one record,
>>>>> which is very likely to be the segment info. The flush thread writes
>>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>>> it always contains at least one record. As such, we are currently
>>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>>> space on disk because of the padding required by the TAR format.
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>>
>>>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Jukka,
>>>>>>
>>>>>> Thanks for sharing your perspective and the historical background.
>>>>>>
>>>>>> I agree that repository size shouldn't be a primary concern. However,
>>>>>> we
>>>>>> have seen many repositories (especially with an external data store)
>>>>>> where
>>>>>> the content is extremely fine granular. Much more than in an initial
>>>>>> content
>>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>>> collecting statistics). So we should at least understand the impact of
>>>>>> the
>>>>>> patch in various scenarios.
>>>>>>
>>>>>> My main concern is the cache footprint of node records. Those are made
>>>>>> up
>>>>>> of
>>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>>> current
>>>>>> patch.
>>>>>>
>>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>>> improve as the patch gets rid of the 255 references limit of segments.
>>>>>> A
>>>>>> limit which in practical deployments leads to degeneration of segment
>>>>>> sizes
>>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some
>>>>>> background
>>>>>> on
>>>>>> this.
>>>>>> Furthermore we already did a big step forward in improving locality in
>>>>>> concurrent write scenarios when we introduced the
>>>>>> SegmentBufferWriterPool.
>>>>>> In essence: thread affinity for segments.
>>>>>>
>>>>>> We should probably be more carefully looking at the micro benchmarks.
>>>>>> I
>>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>>> only
>>>>>> tell you so much. Many of the problems we recently faced only surfaced
>>>>>> in
>>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Cool! I'm pretty sure there are various ways in which the format
>>>>>>> could
>>>>>>> be
>>>>>>> improved, as the original design was based mostly on intuition,
>>>>>>> guided
>>>>>>> somewhat by collected stats
>>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>>> and
>>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>>> used
>>>>>>> to optimize common operations.
>>>>>>>
>>>>>>> Note though that the total size of the repository was not and
>>>>>>> probably
>>>>>>> shouldn't be a primary metric, since the size of a typical repository
>>>>>>> is
>>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>>> idea
>>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>>> Instead
>>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>>> possible is captured in the principles listed in the original design
>>>>>>> doc
>>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>>
>>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>>> reduce
>>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>>> stored in
>>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>>> would as
>>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>>    - Locality. Segments are written so that related records, like a
>>>>>>> node
>>>>>>>    and its immediate children, usually end up stored in the same
>>>>>>> segment.
>>>>>>> This
>>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>>> typical
>>>>>>>    clients that access more than one related node per session.
>>>>>>>
>>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>>> addition
>>>>>>> to raw repository size when evaluating possible improvements. Also,
>>>>>>> the
>>>>>>> number and size of data segments are good size metrics to look at in
>>>>>>> addition to total disk usage.
>>>>>>>
>>>>>>> BR,
>>>>>>>
>>>>>>> Jukka Zitting
>>>>>>>
>>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>>> <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The impact on repository size needs to be assessed with more
>>>>>>>> specific
>>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>>> every time a small fix changes the size of the records.
>>>>>>>>
>>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>>> phase. Of course, it's handy to have this information pre-computed
>>>>>>>> for
>>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>>
>>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>>> simplification exercise can be worth the effort.
>>>>>>>>
>>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>>> segment
>>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ratio. I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>>> repositories of different structures and see how such a number
>>>>>>>>> differs
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> with
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and without the patch.
>>>>>>>>>
>>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time
>>>>>>>>> reducing
>>>>>>>>> complexity a lot.
>>>>>>>>>
>>>>>>>>> OTOH we need to figure out how to regain the lost functionality
>>>>>>>>> (e.g.
>>>>>>>>> gc)
>>>>>>>>> and asses its impact on repository size.
>>>>>>>>>
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>>> maintaining the same functionality?
>>>>>>>>>>
>>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that
>>>>>>>>>> branch
>>>>>>>>>> is
>>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>>
>>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>>
>>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>>> IDs
>>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>>
>>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>>> oak-segment-tar:
>>>>>>>>>>
>>>>>>>>>> 524744 data00000a.tar
>>>>>>>>>> 524584 data00001a.tar
>>>>>>>>>> 524688 data00002a.tar
>>>>>>>>>> 460896 data00003a.tar
>>>>>>>>>> 8 journal.log
>>>>>>>>>> 0 repo.lock
>>>>>>>>>>
>>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>>
>>>>>>>>>> 524864 data00000a.tar
>>>>>>>>>> 524656 data00001a.tar
>>>>>>>>>> 524792 data00002a.tar
>>>>>>>>>> 297288 data00003a.tar
>>>>>>>>>> 8 journal.log
>>>>>>>>>> 0 repo.lock
>>>>>>>>>>
>>>>>>>>>> The process that generates the content doesn't change between the
>>>>>>>>>> two
>>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>>> scenario. For those familiar with it, the content is generated by
>>>>>>>>>> an
>>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>>
>>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>>> much.
>>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>>> content in the large. Another effect of my change is that there is
>>>>>>>>>> no
>>>>>>>>>> limit on the number of referenced segment IDs per segment, and
>>>>>>>>>> this
>>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>>
>>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>>> simplification of the code. I guess I can remove some lines more,
>>>>>>>>>> but
>>>>>>>>>> what I peeled off is already a considerable amount. Look at the
>>>>>>>>>> code!
>>>>>>>>>>
>>>>>>>>>> Francesco
>>>>>>>>>>
>>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>

Re: Are dumb segments dumb?

Posted by Michael Dürig <md...@apache.org>.

Looks good! I think we should give this one a spin. Some minor points we 
should keep an eye on before we commit this though:

- does tooling still work with the changes in the segment format? Some 
of them access the segments directly such that expanding the segment 
header by 2 bytes might break them.

- have a look at the micro benchmarks and compare to before.

- remind us to remember ;-) updating the documentation of the segment 
format at some point

- I would like to have something along the lines of the segment size 
test back. Probably not as a unit test but more as a benchmark for 
record sizes. So instead of it failing the build, it would output some 
numbers which we could then graph very much the same way like for 
performance benchmarks.

Michael


On 26.7.16 11:47 , Francesco Mari wrote:
> With my latest commits on this branch [1] I enabled every previously
> ignored test, fixing them when needed., The only two exceptions are
> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
> I also added a couple of tests to cover the cases that work slightly
> differently than before.
>
> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>
> 2016-07-25 17:48 GMT+02:00 Francesco Mari <ma...@gmail.com>:
>> It might be a variation in the process I tried. This shouldn't affect
>> much the statistics anyway, given that the population sample is big
>> enough in both cases.
>>
>> 2016-07-25 17:46 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>
>>> Interesting numbers. Most of them look as I would have expected. I.e. the
>>> distributions in the dumb case are more regular (smaller std. dev, mean and
>>> median closer to each other), bigger segment sizes, etc.
>>>
>>> What I don't understand is the total number of records. These numbers differ
>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>> reproducible) or are we missing out on something.
>>>
>>> Michael
>>>
>>>
>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>
>>>> I put together some statistics [1] for the process I described above.
>>>> The "dumb" variant requires more segments to store the same amount of
>>>> data, because of the increased size of serialised record IDs.  As you
>>>> can see the amount of records per segment is definitely lower in the
>>>> dumb variant.
>>>>
>>>> On the other hand, ignoring the growth of segment ID reference table
>>>> seems to be a good choice. As shown from the segment size average,
>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>> lower standard deviation shows that it's more common to have full dumb
>>>> segments.
>>>>
>>>> In addition, my analysis seems to have found a bug too. There are a
>>>> lot of segments with no segment ID references and only one record,
>>>> which is very likely to be the segment info. The flush thread writes
>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>> it always contains at least one record. As such, we are currently
>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>> space on disk because of the padding required by the TAR format.
>>>>
>>>> [1]:
>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>
>>>> 2016-07-25 10:05 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>
>>>>>
>>>>> Hi Jukka,
>>>>>
>>>>> Thanks for sharing your perspective and the historical background.
>>>>>
>>>>> I agree that repository size shouldn't be a primary concern. However, we
>>>>> have seen many repositories (especially with an external data store)
>>>>> where
>>>>> the content is extremely fine granular. Much more than in an initial
>>>>> content
>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>> collecting statistics). So we should at least understand the impact of
>>>>> the
>>>>> patch in various scenarios.
>>>>>
>>>>> My main concern is the cache footprint of node records. Those are made up
>>>>> of
>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>> current
>>>>> patch.
>>>>>
>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>> improve as the patch gets rid of the 255 references limit of segments. A
>>>>> limit which in practical deployments leads to degeneration of segment
>>>>> sizes
>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some background
>>>>> on
>>>>> this.
>>>>> Furthermore we already did a big step forward in improving locality in
>>>>> concurrent write scenarios when we introduced the
>>>>> SegmentBufferWriterPool.
>>>>> In essence: thread affinity for segments.
>>>>>
>>>>> We should probably be more carefully looking at the micro benchmarks. I
>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>> only
>>>>> tell you so much. Many of the problems we recently faced only surfaced in
>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Cool! I'm pretty sure there are various ways in which the format could
>>>>>> be
>>>>>> improved, as the original design was based mostly on intuition, guided
>>>>>> somewhat by collected stats
>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>> and
>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>> used
>>>>>> to optimize common operations.
>>>>>>
>>>>>> Note though that the total size of the repository was not and probably
>>>>>> shouldn't be a primary metric, since the size of a typical repository is
>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>> idea
>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>> Instead
>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>> possible is captured in the principles listed in the original design doc
>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>
>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>> reduce
>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>> stored in
>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>> would as
>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>    - Locality. Segments are written so that related records, like a node
>>>>>>    and its immediate children, usually end up stored in the same
>>>>>> segment.
>>>>>> This
>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>> typical
>>>>>>    clients that access more than one related node per session.
>>>>>>
>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>> addition
>>>>>> to raw repository size when evaluating possible improvements. Also, the
>>>>>> number and size of data segments are good size metrics to look at in
>>>>>> addition to total disk usage.
>>>>>>
>>>>>> BR,
>>>>>>
>>>>>> Jukka Zitting
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>> <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The impact on repository size needs to be assessed with more specific
>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>> every time a small fix changes the size of the records.
>>>>>>>
>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>> phase. Of course, it's handy to have this information pre-computed for
>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>
>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>> simplification exercise can be worth the effort.
>>>>>>>
>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>> segment
>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>
>>>>>>>
>>>>>>> ratio. I
>>>>>>>>
>>>>>>>>
>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>> repositories of different structures and see how such a number differs
>>>>>>>
>>>>>>>
>>>>>>> with
>>>>>>>>
>>>>>>>>
>>>>>>>> and without the patch.
>>>>>>>>
>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>>>>> complexity a lot.
>>>>>>>>
>>>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>>>>> gc)
>>>>>>>> and asses its impact on repository size.
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>> maintaining the same functionality?
>>>>>>>>>
>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch
>>>>>>>>> is
>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>
>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>
>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>> IDs
>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>
>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>> oak-segment-tar:
>>>>>>>>>
>>>>>>>>> 524744 data00000a.tar
>>>>>>>>> 524584 data00001a.tar
>>>>>>>>> 524688 data00002a.tar
>>>>>>>>> 460896 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>
>>>>>>>>> 524864 data00000a.tar
>>>>>>>>> 524656 data00001a.tar
>>>>>>>>> 524792 data00002a.tar
>>>>>>>>> 297288 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> The process that generates the content doesn't change between the two
>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>
>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>> much.
>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>> content in the large. Another effect of my change is that there is no
>>>>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>
>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>>>>
>>>>>>>>> Francesco
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

The recent discovery in OAK-4604 shows that my POC suffers from the
same problem. I fixed it in my latest commit.

2016-07-26 11:47 GMT+02:00 Francesco Mari <ma...@gmail.com>:
> With my latest commits on this branch [1] I enabled every previously
> ignored test, fixing them when needed., The only two exceptions are
> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
> I also added a couple of tests to cover the cases that work slightly
> differently than before.
>
> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>
> 2016-07-25 17:48 GMT+02:00 Francesco Mari <ma...@gmail.com>:
>> It might be a variation in the process I tried. This shouldn't affect
>> much the statistics anyway, given that the population sample is big
>> enough in both cases.
>>
>> 2016-07-25 17:46 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>
>>> Interesting numbers. Most of them look as I would have expected. I.e. the
>>> distributions in the dumb case are more regular (smaller std. dev, mean and
>>> median closer to each other), bigger segment sizes, etc.
>>>
>>> What I don't understand is the total number of records. These numbers differ
>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>> reproducible) or are we missing out on something.
>>>
>>> Michael
>>>
>>>
>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>
>>>> I put together some statistics [1] for the process I described above.
>>>> The "dumb" variant requires more segments to store the same amount of
>>>> data, because of the increased size of serialised record IDs.  As you
>>>> can see the amount of records per segment is definitely lower in the
>>>> dumb variant.
>>>>
>>>> On the other hand, ignoring the growth of segment ID reference table
>>>> seems to be a good choice. As shown from the segment size average,
>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>> lower standard deviation shows that it's more common to have full dumb
>>>> segments.
>>>>
>>>> In addition, my analysis seems to have found a bug too. There are a
>>>> lot of segments with no segment ID references and only one record,
>>>> which is very likely to be the segment info. The flush thread writes
>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>> it always contains at least one record. As such, we are currently
>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>> space on disk because of the padding required by the TAR format.
>>>>
>>>> [1]:
>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>
>>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>
>>>>>
>>>>> Hi Jukka,
>>>>>
>>>>> Thanks for sharing your perspective and the historical background.
>>>>>
>>>>> I agree that repository size shouldn't be a primary concern. However, we
>>>>> have seen many repositories (especially with an external data store)
>>>>> where
>>>>> the content is extremely fine granular. Much more than in an initial
>>>>> content
>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>> collecting statistics). So we should at least understand the impact of
>>>>> the
>>>>> patch in various scenarios.
>>>>>
>>>>> My main concern is the cache footprint of node records. Those are made up
>>>>> of
>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>> current
>>>>> patch.
>>>>>
>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>> improve as the patch gets rid of the 255 references limit of segments. A
>>>>> limit which in practical deployments leads to degeneration of segment
>>>>> sizes
>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some background
>>>>> on
>>>>> this.
>>>>> Furthermore we already did a big step forward in improving locality in
>>>>> concurrent write scenarios when we introduced the
>>>>> SegmentBufferWriterPool.
>>>>> In essence: thread affinity for segments.
>>>>>
>>>>> We should probably be more carefully looking at the micro benchmarks. I
>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>> only
>>>>> tell you so much. Many of the problems we recently faced only surfaced in
>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Cool! I'm pretty sure there are various ways in which the format could
>>>>>> be
>>>>>> improved, as the original design was based mostly on intuition, guided
>>>>>> somewhat by collected stats
>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>> and
>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>> used
>>>>>> to optimize common operations.
>>>>>>
>>>>>> Note though that the total size of the repository was not and probably
>>>>>> shouldn't be a primary metric, since the size of a typical repository is
>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>> idea
>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>> Instead
>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>> possible is captured in the principles listed in the original design doc
>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>
>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>> reduce
>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>> stored in
>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>> would as
>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>    - Locality. Segments are written so that related records, like a node
>>>>>>    and its immediate children, usually end up stored in the same
>>>>>> segment.
>>>>>> This
>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>> typical
>>>>>>    clients that access more than one related node per session.
>>>>>>
>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>> addition
>>>>>> to raw repository size when evaluating possible improvements. Also, the
>>>>>> number and size of data segments are good size metrics to look at in
>>>>>> addition to total disk usage.
>>>>>>
>>>>>> BR,
>>>>>>
>>>>>> Jukka Zitting
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>> <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The impact on repository size needs to be assessed with more specific
>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>> every time a small fix changes the size of the records.
>>>>>>>
>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>> phase. Of course, it's handy to have this information pre-computed for
>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>
>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>> simplification exercise can be worth the effort.
>>>>>>>
>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>> segment
>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>
>>>>>>>
>>>>>>> ratio. I
>>>>>>>>
>>>>>>>>
>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>> repositories of different structures and see how such a number differs
>>>>>>>
>>>>>>>
>>>>>>> with
>>>>>>>>
>>>>>>>>
>>>>>>>> and without the patch.
>>>>>>>>
>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>>>>> complexity a lot.
>>>>>>>>
>>>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>>>>> gc)
>>>>>>>> and asses its impact on repository size.
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>> maintaining the same functionality?
>>>>>>>>>
>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch
>>>>>>>>> is
>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>
>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>
>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>> IDs
>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>
>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>> oak-segment-tar:
>>>>>>>>>
>>>>>>>>> 524744 data00000a.tar
>>>>>>>>> 524584 data00001a.tar
>>>>>>>>> 524688 data00002a.tar
>>>>>>>>> 460896 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>
>>>>>>>>> 524864 data00000a.tar
>>>>>>>>> 524656 data00001a.tar
>>>>>>>>> 524792 data00002a.tar
>>>>>>>>> 297288 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> The process that generates the content doesn't change between the two
>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>
>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>> much.
>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>> content in the large. Another effect of my change is that there is no
>>>>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>
>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>>>>
>>>>>>>>> Francesco
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

With my latest commits on this branch [1] I enabled every previously
ignored test, fixing them when needed., The only two exceptions are
RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
I also added a couple of tests to cover the cases that work slightly
differently than before.

[1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb

2016-07-25 17:48 GMT+02:00 Francesco Mari <ma...@gmail.com>:
> It might be a variation in the process I tried. This shouldn't affect
> much the statistics anyway, given that the population sample is big
> enough in both cases.
>
> 2016-07-25 17:46 GMT+02:00 Michael Dürig <md...@apache.org>:
>>
>> Interesting numbers. Most of them look as I would have expected. I.e. the
>> distributions in the dumb case are more regular (smaller std. dev, mean and
>> median closer to each other), bigger segment sizes, etc.
>>
>> What I don't understand is the total number of records. These numbers differ
>> greatly between current and dumb. Is this a test artefact (i.e. test not
>> reproducible) or are we missing out on something.
>>
>> Michael
>>
>>
>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>
>>> I put together some statistics [1] for the process I described above.
>>> The "dumb" variant requires more segments to store the same amount of
>>> data, because of the increased size of serialised record IDs.  As you
>>> can see the amount of records per segment is definitely lower in the
>>> dumb variant.
>>>
>>> On the other hand, ignoring the growth of segment ID reference table
>>> seems to be a good choice. As shown from the segment size average,
>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>> lower standard deviation shows that it's more common to have full dumb
>>> segments.
>>>
>>> In addition, my analysis seems to have found a bug too. There are a
>>> lot of segments with no segment ID references and only one record,
>>> which is very likely to be the segment info. The flush thread writes
>>> every 5 seconds the current segment buffer, provided that the buffer
>>> is not empty. It turns out that a segment buffer is never empty, since
>>> it always contains at least one record. As such, we are currently
>>> leaking almost empty segments every 5 seconds, that waste additional
>>> space on disk because of the padding required by the TAR format.
>>>
>>> [1]:
>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>
>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>
>>>>
>>>> Hi Jukka,
>>>>
>>>> Thanks for sharing your perspective and the historical background.
>>>>
>>>> I agree that repository size shouldn't be a primary concern. However, we
>>>> have seen many repositories (especially with an external data store)
>>>> where
>>>> the content is extremely fine granular. Much more than in an initial
>>>> content
>>>> installation of CQ (which I believe was one of the initial setup for
>>>> collecting statistics). So we should at least understand the impact of
>>>> the
>>>> patch in various scenarios.
>>>>
>>>> My main concern is the cache footprint of node records. Those are made up
>>>> of
>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>> current
>>>> patch.
>>>>
>>>> Locality is not so much of concern here. I would expect it to actually
>>>> improve as the patch gets rid of the 255 references limit of segments. A
>>>> limit which in practical deployments leads to degeneration of segment
>>>> sizes
>>>> (I regularly see median sizes below 5k). See OAK-2896 for some background
>>>> on
>>>> this.
>>>> Furthermore we already did a big step forward in improving locality in
>>>> concurrent write scenarios when we introduced the
>>>> SegmentBufferWriterPool.
>>>> In essence: thread affinity for segments.
>>>>
>>>> We should probably be more carefully looking at the micro benchmarks. I
>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>> only
>>>> tell you so much. Many of the problems we recently faced only surfaced in
>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Cool! I'm pretty sure there are various ways in which the format could
>>>>> be
>>>>> improved, as the original design was based mostly on intuition, guided
>>>>> somewhat by collected stats
>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>> and
>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>> used
>>>>> to optimize common operations.
>>>>>
>>>>> Note though that the total size of the repository was not and probably
>>>>> shouldn't be a primary metric, since the size of a typical repository is
>>>>> governed mostly by binaries and string properties (though it's a good
>>>>> idea
>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>> Instead
>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>> possible is captured in the principles listed in the original design doc
>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>
>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>> reduce
>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>> stored in
>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>> would as
>>>>>    a bundle in Jackrabbit Classic.
>>>>>    - Locality. Segments are written so that related records, like a node
>>>>>    and its immediate children, usually end up stored in the same
>>>>> segment.
>>>>> This
>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>> typical
>>>>>    clients that access more than one related node per session.
>>>>>
>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>> addition
>>>>> to raw repository size when evaluating possible improvements. Also, the
>>>>> number and size of data segments are good size metrics to look at in
>>>>> addition to total disk usage.
>>>>>
>>>>> BR,
>>>>>
>>>>> Jukka Zitting
>>>>>
>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>> <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The impact on repository size needs to be assessed with more specific
>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>> every time a small fix changes the size of the records.
>>>>>>
>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>> phase. Of course, it's handy to have this information pre-computed for
>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>
>>>>>> There are still so many questions to answer, but I think that this
>>>>>> simplification exercise can be worth the effort.
>>>>>>
>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>> segment
>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>
>>>>>>
>>>>>> ratio. I
>>>>>>>
>>>>>>>
>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>> repositories of different structures and see how such a number differs
>>>>>>
>>>>>>
>>>>>> with
>>>>>>>
>>>>>>>
>>>>>>> and without the patch.
>>>>>>>
>>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>>>> complexity a lot.
>>>>>>>
>>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>>>> gc)
>>>>>>> and asses its impact on repository size.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>> maintaining the same functionality?
>>>>>>>>
>>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch
>>>>>>>> is
>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>
>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>
>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>> IDs
>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>
>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>> oak-segment-tar:
>>>>>>>>
>>>>>>>> 524744 data00000a.tar
>>>>>>>> 524584 data00001a.tar
>>>>>>>> 524688 data00002a.tar
>>>>>>>> 460896 data00003a.tar
>>>>>>>> 8 journal.log
>>>>>>>> 0 repo.lock
>>>>>>>>
>>>>>>>> This is the one created by the current trunk:
>>>>>>>>
>>>>>>>> 524864 data00000a.tar
>>>>>>>> 524656 data00001a.tar
>>>>>>>> 524792 data00002a.tar
>>>>>>>> 297288 data00003a.tar
>>>>>>>> 8 journal.log
>>>>>>>> 0 repo.lock
>>>>>>>>
>>>>>>>> The process that generates the content doesn't change between the two
>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>
>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>> much.
>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>> content in the large. Another effect of my change is that there is no
>>>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>
>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>>>
>>>>>>>> Francesco
>>>>>>>>
>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

It might be a variation in the process I tried. This shouldn't affect
much the statistics anyway, given that the population sample is big
enough in both cases.

2016-07-25 17:46 GMT+02:00 Michael Dürig <md...@apache.org>:
>
> Interesting numbers. Most of them look as I would have expected. I.e. the
> distributions in the dumb case are more regular (smaller std. dev, mean and
> median closer to each other), bigger segment sizes, etc.
>
> What I don't understand is the total number of records. These numbers differ
> greatly between current and dumb. Is this a test artefact (i.e. test not
> reproducible) or are we missing out on something.
>
> Michael
>
>
> On 25.7.16 4:01 , Francesco Mari wrote:
>>
>> I put together some statistics [1] for the process I described above.
>> The "dumb" variant requires more segments to store the same amount of
>> data, because of the increased size of serialised record IDs.  As you
>> can see the amount of records per segment is definitely lower in the
>> dumb variant.
>>
>> On the other hand, ignoring the growth of segment ID reference table
>> seems to be a good choice. As shown from the segment size average,
>> dumb segments are usually fuller that their counterpart. Moreover, a
>> lower standard deviation shows that it's more common to have full dumb
>> segments.
>>
>> In addition, my analysis seems to have found a bug too. There are a
>> lot of segments with no segment ID references and only one record,
>> which is very likely to be the segment info. The flush thread writes
>> every 5 seconds the current segment buffer, provided that the buffer
>> is not empty. It turns out that a segment buffer is never empty, since
>> it always contains at least one record. As such, we are currently
>> leaking almost empty segments every 5 seconds, that waste additional
>> space on disk because of the padding required by the TAR format.
>>
>> [1]:
>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>
>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>
>>>
>>> Hi Jukka,
>>>
>>> Thanks for sharing your perspective and the historical background.
>>>
>>> I agree that repository size shouldn't be a primary concern. However, we
>>> have seen many repositories (especially with an external data store)
>>> where
>>> the content is extremely fine granular. Much more than in an initial
>>> content
>>> installation of CQ (which I believe was one of the initial setup for
>>> collecting statistics). So we should at least understand the impact of
>>> the
>>> patch in various scenarios.
>>>
>>> My main concern is the cache footprint of node records. Those are made up
>>> of
>>> a list of record ids and would thus grow by a factor of 6 with the
>>> current
>>> patch.
>>>
>>> Locality is not so much of concern here. I would expect it to actually
>>> improve as the patch gets rid of the 255 references limit of segments. A
>>> limit which in practical deployments leads to degeneration of segment
>>> sizes
>>> (I regularly see median sizes below 5k). See OAK-2896 for some background
>>> on
>>> this.
>>> Furthermore we already did a big step forward in improving locality in
>>> concurrent write scenarios when we introduced the
>>> SegmentBufferWriterPool.
>>> In essence: thread affinity for segments.
>>>
>>> We should probably be more carefully looking at the micro benchmarks. I
>>> guess we neglected this part a bit in the past. Unfortunately CI
>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>> only
>>> tell you so much. Many of the problems we recently faced only surfaced in
>>> the large: huge repos, high concurrent load, many days of traffic.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>>
>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Cool! I'm pretty sure there are various ways in which the format could
>>>> be
>>>> improved, as the original design was based mostly on intuition, guided
>>>> somewhat by collected stats
>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>> and
>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>> used
>>>> to optimize common operations.
>>>>
>>>> Note though that the total size of the repository was not and probably
>>>> shouldn't be a primary metric, since the size of a typical repository is
>>>> governed mostly by binaries and string properties (though it's a good
>>>> idea
>>>> to make sure you avoid things like duplicates of large binaries).
>>>> Instead
>>>> the rationale for squeezing things like record ids to as few bytes as
>>>> possible is captured in the principles listed in the original design doc
>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>
>>>>    - Compactness. The formatting of records is optimized for size to
>>>> reduce
>>>>    IO costs and to fit as much content in caches as possible. A node
>>>> stored in
>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>> would as
>>>>    a bundle in Jackrabbit Classic.
>>>>    - Locality. Segments are written so that related records, like a node
>>>>    and its immediate children, usually end up stored in the same
>>>> segment.
>>>> This
>>>>    makes tree traversals very fast and avoids most cache misses for
>>>> typical
>>>>    clients that access more than one related node per session.
>>>>
>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>> addition
>>>> to raw repository size when evaluating possible improvements. Also, the
>>>> number and size of data segments are good size metrics to look at in
>>>> addition to total disk usage.
>>>>
>>>> BR,
>>>>
>>>> Jukka Zitting
>>>>
>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>> <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> The impact on repository size needs to be assessed with more specific
>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>> these tests are usually the first to be disabled or blindly updated
>>>>> every time a small fix changes the size of the records.
>>>>>
>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>> phase. Of course, it's handy to have this information pre-computed for
>>>>> you, but since the record graph is traversed anyway we could think
>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>
>>>>> There are still so many questions to answer, but I think that this
>>>>> simplification exercise can be worth the effort.
>>>>>
>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>> segment
>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>
>>>>>
>>>>> ratio. I
>>>>>>
>>>>>>
>>>>>> think we should look at the #references / repository size ratio for
>>>>>> repositories of different structures and see how such a number differs
>>>>>
>>>>>
>>>>> with
>>>>>>
>>>>>>
>>>>>> and without the patch.
>>>>>>
>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>>> complexity a lot.
>>>>>>
>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>>> gc)
>>>>>> and asses its impact on repository size.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>> optimisations can be removed from the current segment format while
>>>>>>> maintaining the same functionality?
>>>>>>>
>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch
>>>>>>> is
>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>
>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>
>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>> IDs
>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>
>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>> content using the current trunk and the dumber version of
>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>> oak-segment-tar:
>>>>>>>
>>>>>>> 524744 data00000a.tar
>>>>>>> 524584 data00001a.tar
>>>>>>> 524688 data00002a.tar
>>>>>>> 460896 data00003a.tar
>>>>>>> 8 journal.log
>>>>>>> 0 repo.lock
>>>>>>>
>>>>>>> This is the one created by the current trunk:
>>>>>>>
>>>>>>> 524864 data00000a.tar
>>>>>>> 524656 data00001a.tar
>>>>>>> 524792 data00002a.tar
>>>>>>> 297288 data00003a.tar
>>>>>>> 8 journal.log
>>>>>>> 0 repo.lock
>>>>>>>
>>>>>>> The process that generates the content doesn't change between the two
>>>>>>> executions, and the generated content is coming from a real world
>>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>>> installation of Adobe Experience Manager.
>>>>>>>
>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>> much.
>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>> content in the large. Another effect of my change is that there is no
>>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>>> might allow segments to pack more records than before.
>>>>>>>
>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>>
>>>>>>> Francesco
>>>>>>>
>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Are dumb segments dumb?

Posted by Michael Dürig <md...@apache.org>.

Interesting numbers. Most of them look as I would have expected. I.e. 
the distributions in the dumb case are more regular (smaller std. dev, 
mean and median closer to each other), bigger segment sizes, etc.

What I don't understand is the total number of records. These numbers 
differ greatly between current and dumb. Is this a test artefact (i.e. 
test not reproducible) or are we missing out on something.

Michael

On 25.7.16 4:01 , Francesco Mari wrote:
> I put together some statistics [1] for the process I described above.
> The "dumb" variant requires more segments to store the same amount of
> data, because of the increased size of serialised record IDs.  As you
> can see the amount of records per segment is definitely lower in the
> dumb variant.
>
> On the other hand, ignoring the growth of segment ID reference table
> seems to be a good choice. As shown from the segment size average,
> dumb segments are usually fuller that their counterpart. Moreover, a
> lower standard deviation shows that it's more common to have full dumb
> segments.
>
> In addition, my analysis seems to have found a bug too. There are a
> lot of segments with no segment ID references and only one record,
> which is very likely to be the segment info. The flush thread writes
> every 5 seconds the current segment buffer, provided that the buffer
> is not empty. It turns out that a segment buffer is never empty, since
> it always contains at least one record. As such, we are currently
> leaking almost empty segments every 5 seconds, that waste additional
> space on disk because of the padding required by the TAR format.
>
> [1]: https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>
> 2016-07-25 10:05 GMT+02:00 Michael D�rig <md...@apache.org>:
>>
>> Hi Jukka,
>>
>> Thanks for sharing your perspective and the historical background.
>>
>> I agree that repository size shouldn't be a primary concern. However, we
>> have seen many repositories (especially with an external data store) where
>> the content is extremely fine granular. Much more than in an initial content
>> installation of CQ (which I believe was one of the initial setup for
>> collecting statistics). So we should at least understand the impact of the
>> patch in various scenarios.
>>
>> My main concern is the cache footprint of node records. Those are made up of
>> a list of record ids and would thus grow by a factor of 6 with the current
>> patch.
>>
>> Locality is not so much of concern here. I would expect it to actually
>> improve as the patch gets rid of the 255 references limit of segments. A
>> limit which in practical deployments leads to degeneration of segment sizes
>> (I regularly see median sizes below 5k). See OAK-2896 for some background on
>> this.
>> Furthermore we already did a big step forward in improving locality in
>> concurrent write scenarios when we introduced the SegmentBufferWriterPool.
>> In essence: thread affinity for segments.
>>
>> We should probably be more carefully looking at the micro benchmarks. I
>> guess we neglected this part a bit in the past. Unfortunately CI
>> infrastructure isn't making this easy for us... OTOH those benchmarks only
>> tell you so much. Many of the problems we recently faced only surfaced in
>> the large: huge repos, high concurrent load, many days of traffic.
>>
>> Michael
>>
>>
>>
>>
>>
>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>
>>> Hi,
>>>
>>> Cool! I'm pretty sure there are various ways in which the format could be
>>> improved, as the original design was based mostly on intuition, guided
>>> somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe>
>>> and
>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
>>> to optimize common operations.
>>>
>>> Note though that the total size of the repository was not and probably
>>> shouldn't be a primary metric, since the size of a typical repository is
>>> governed mostly by binaries and string properties (though it's a good idea
>>> to make sure you avoid things like duplicates of large binaries). Instead
>>> the rationale for squeezing things like record ids to as few bytes as
>>> possible is captured in the principles listed in the original design doc
>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>
>>>    - Compactness. The formatting of records is optimized for size to
>>> reduce
>>>    IO costs and to fit as much content in caches as possible. A node
>>> stored in
>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>> would as
>>>    a bundle in Jackrabbit Classic.
>>>    - Locality. Segments are written so that related records, like a node
>>>    and its immediate children, usually end up stored in the same segment.
>>> This
>>>    makes tree traversals very fast and avoids most cache misses for
>>> typical
>>>    clients that access more than one related node per session.
>>>
>>> Thus I would recommend keeping an eye also on benchmark results in
>>> addition
>>> to raw repository size when evaluating possible improvements. Also, the
>>> number and size of data segments are good size metrics to look at in
>>> addition to total disk usage.
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <ma...@gmail.com>
>>> wrote:
>>>
>>>> The impact on repository size needs to be assessed with more specific
>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>> these tests are usually the first to be disabled or blindly updated
>>>> every time a small fix changes the size of the records.
>>>>
>>>> Regarding GC, the segment graph could be computed during the mark
>>>> phase. Of course, it's handy to have this information pre-computed for
>>>> you, but since the record graph is traversed anyway we could think
>>>> about dynamically reconstructing the segment graph when needed.
>>>>
>>>> There are still so many questions to answer, but I think that this
>>>> simplification exercise can be worth the effort.
>>>>
>>>> 2016-07-22 11:34 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Neat! I would have expected a greater impact on the size of the segment
>>>>> store. But as you say it probably all depends on the binary/content
>>>>
>>>> ratio. I
>>>>>
>>>>> think we should look at the #references / repository size ratio for
>>>>> repositories of different structures and see how such a number differs
>>>>
>>>> with
>>>>>
>>>>> and without the patch.
>>>>>
>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>> complexity a lot.
>>>>>
>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>> gc)
>>>>> and asses its impact on repository size.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>> optimisations can be removed from the current segment format while
>>>>>> maintaining the same functionality?
>>>>>>
>>>>>> I made some work in a branch on GitHub [1]. The code on that branch is
>>>>>> similar to the current trunk except for the following changes:
>>>>>>
>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>
>>>>>> 2. Because of the previous change, the table of referenced segment IDs
>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>> compaction, so this feature is broken in that branch.
>>>>>>
>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>> content using the current trunk and the dumber version of
>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>> oak-segment-tar:
>>>>>>
>>>>>> 524744 data00000a.tar
>>>>>> 524584 data00001a.tar
>>>>>> 524688 data00002a.tar
>>>>>> 460896 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> This is the one created by the current trunk:
>>>>>>
>>>>>> 524864 data00000a.tar
>>>>>> 524656 data00001a.tar
>>>>>> 524792 data00002a.tar
>>>>>> 297288 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> The process that generates the content doesn't change between the two
>>>>>> executions, and the generated content is coming from a real world
>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>> installation of Adobe Experience Manager.
>>>>>>
>>>>>> It looks like that the size of the repository is not changing so much.
>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>> content in the large. Another effect of my change is that there is no
>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>> might allow segments to pack more records than before.
>>>>>>
>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>
>>>>>> Francesco
>>>>>>
>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

I opened OAK-4596 to track the segment leak.

2016-07-25 16:01 GMT+02:00 Francesco Mari <ma...@gmail.com>:
> I put together some statistics [1] for the process I described above.
> The "dumb" variant requires more segments to store the same amount of
> data, because of the increased size of serialised record IDs.  As you
> can see the amount of records per segment is definitely lower in the
> dumb variant.
>
> On the other hand, ignoring the growth of segment ID reference table
> seems to be a good choice. As shown from the segment size average,
> dumb segments are usually fuller that their counterpart. Moreover, a
> lower standard deviation shows that it's more common to have full dumb
> segments.
>
> In addition, my analysis seems to have found a bug too. There are a
> lot of segments with no segment ID references and only one record,
> which is very likely to be the segment info. The flush thread writes
> every 5 seconds the current segment buffer, provided that the buffer
> is not empty. It turns out that a segment buffer is never empty, since
> it always contains at least one record. As such, we are currently
> leaking almost empty segments every 5 seconds, that waste additional
> space on disk because of the padding required by the TAR format.
>
> [1]: https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>
> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>>
>> Hi Jukka,
>>
>> Thanks for sharing your perspective and the historical background.
>>
>> I agree that repository size shouldn't be a primary concern. However, we
>> have seen many repositories (especially with an external data store) where
>> the content is extremely fine granular. Much more than in an initial content
>> installation of CQ (which I believe was one of the initial setup for
>> collecting statistics). So we should at least understand the impact of the
>> patch in various scenarios.
>>
>> My main concern is the cache footprint of node records. Those are made up of
>> a list of record ids and would thus grow by a factor of 6 with the current
>> patch.
>>
>> Locality is not so much of concern here. I would expect it to actually
>> improve as the patch gets rid of the 255 references limit of segments. A
>> limit which in practical deployments leads to degeneration of segment sizes
>> (I regularly see median sizes below 5k). See OAK-2896 for some background on
>> this.
>> Furthermore we already did a big step forward in improving locality in
>> concurrent write scenarios when we introduced the SegmentBufferWriterPool.
>> In essence: thread affinity for segments.
>>
>> We should probably be more carefully looking at the micro benchmarks. I
>> guess we neglected this part a bit in the past. Unfortunately CI
>> infrastructure isn't making this easy for us... OTOH those benchmarks only
>> tell you so much. Many of the problems we recently faced only surfaced in
>> the large: huge repos, high concurrent load, many days of traffic.
>>
>> Michael
>>
>>
>>
>>
>>
>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>
>>> Hi,
>>>
>>> Cool! I'm pretty sure there are various ways in which the format could be
>>> improved, as the original design was based mostly on intuition, guided
>>> somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe>
>>> and
>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
>>> to optimize common operations.
>>>
>>> Note though that the total size of the repository was not and probably
>>> shouldn't be a primary metric, since the size of a typical repository is
>>> governed mostly by binaries and string properties (though it's a good idea
>>> to make sure you avoid things like duplicates of large binaries). Instead
>>> the rationale for squeezing things like record ids to as few bytes as
>>> possible is captured in the principles listed in the original design doc
>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>
>>>    - Compactness. The formatting of records is optimized for size to
>>> reduce
>>>    IO costs and to fit as much content in caches as possible. A node
>>> stored in
>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>> would as
>>>    a bundle in Jackrabbit Classic.
>>>    - Locality. Segments are written so that related records, like a node
>>>    and its immediate children, usually end up stored in the same segment.
>>> This
>>>    makes tree traversals very fast and avoids most cache misses for
>>> typical
>>>    clients that access more than one related node per session.
>>>
>>> Thus I would recommend keeping an eye also on benchmark results in
>>> addition
>>> to raw repository size when evaluating possible improvements. Also, the
>>> number and size of data segments are good size metrics to look at in
>>> addition to total disk usage.
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <ma...@gmail.com>
>>> wrote:
>>>
>>>> The impact on repository size needs to be assessed with more specific
>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>> these tests are usually the first to be disabled or blindly updated
>>>> every time a small fix changes the size of the records.
>>>>
>>>> Regarding GC, the segment graph could be computed during the mark
>>>> phase. Of course, it's handy to have this information pre-computed for
>>>> you, but since the record graph is traversed anyway we could think
>>>> about dynamically reconstructing the segment graph when needed.
>>>>
>>>> There are still so many questions to answer, but I think that this
>>>> simplification exercise can be worth the effort.
>>>>
>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Neat! I would have expected a greater impact on the size of the segment
>>>>> store. But as you say it probably all depends on the binary/content
>>>>
>>>> ratio. I
>>>>>
>>>>> think we should look at the #references / repository size ratio for
>>>>> repositories of different structures and see how such a number differs
>>>>
>>>> with
>>>>>
>>>>> and without the patch.
>>>>>
>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>> complexity a lot.
>>>>>
>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>> gc)
>>>>> and asses its impact on repository size.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>> optimisations can be removed from the current segment format while
>>>>>> maintaining the same functionality?
>>>>>>
>>>>>> I made some work in a branch on GitHub [1]. The code on that branch is
>>>>>> similar to the current trunk except for the following changes:
>>>>>>
>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>
>>>>>> 2. Because of the previous change, the table of referenced segment IDs
>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>> compaction, so this feature is broken in that branch.
>>>>>>
>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>> content using the current trunk and the dumber version of
>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>> oak-segment-tar:
>>>>>>
>>>>>> 524744 data00000a.tar
>>>>>> 524584 data00001a.tar
>>>>>> 524688 data00002a.tar
>>>>>> 460896 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> This is the one created by the current trunk:
>>>>>>
>>>>>> 524864 data00000a.tar
>>>>>> 524656 data00001a.tar
>>>>>> 524792 data00002a.tar
>>>>>> 297288 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> The process that generates the content doesn't change between the two
>>>>>> executions, and the generated content is coming from a real world
>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>> installation of Adobe Experience Manager.
>>>>>>
>>>>>> It looks like that the size of the repository is not changing so much.
>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>> content in the large. Another effect of my change is that there is no
>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>> might allow segments to pack more records than before.
>>>>>>
>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>
>>>>>> Francesco
>>>>>>
>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Are dumb segments dumb?

Posted by Jukka Zitting <ju...@gmail.com>.

Nice stats! I like the fact that this brings the median and mean numbers
closer to each other, and I agree with Michael's point about the
troublesome small segments.

The fact that the maximum number of segment ids seen in the dumb variant is
just 490 suggests an alternative design of storing record ids using 9 + 15
or 10 + 14 bits instead of the usual 8 + 16 bits in cases where the limit
of 255 segment ids gets exceeded. That would limit the size of such
segments to 128KiB or 64KiB instead of the normal 256KiB, but that would
already be a major improvement over the mentioned < 5KiB segments. This
design would avoid the reduction in the number of records per segment and
should have minimal impact on performance.

On the other hand I do appreciate the way "dumb" segments help reduce code
complexity. Having some benchmark data would make it easier to estimate
which trade-offs make the most sense.

BR,

Jukka Zitting

On Mon, Jul 25, 2016 at 10:01 AM Francesco Mari <ma...@gmail.com>
wrote:

> I put together some statistics [1] for the process I described above.
> The "dumb" variant requires more segments to store the same amount of
> data, because of the increased size of serialised record IDs.  As you
> can see the amount of records per segment is definitely lower in the
> dumb variant.
>
> On the other hand, ignoring the growth of segment ID reference table
> seems to be a good choice. As shown from the segment size average,
> dumb segments are usually fuller that their counterpart. Moreover, a
> lower standard deviation shows that it's more common to have full dumb
> segments.
>
> In addition, my analysis seems to have found a bug too. There are a
> lot of segments with no segment ID references and only one record,
> which is very likely to be the segment info. The flush thread writes
> every 5 seconds the current segment buffer, provided that the buffer
> is not empty. It turns out that a segment buffer is never empty, since
> it always contains at least one record. As such, we are currently
> leaking almost empty segments every 5 seconds, that waste additional
> space on disk because of the padding required by the TAR format.
>
> [1]:
> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>
> 2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
> >
> > Hi Jukka,
> >
> > Thanks for sharing your perspective and the historical background.
> >
> > I agree that repository size shouldn't be a primary concern. However, we
> > have seen many repositories (especially with an external data store)
> where
> > the content is extremely fine granular. Much more than in an initial
> content
> > installation of CQ (which I believe was one of the initial setup for
> > collecting statistics). So we should at least understand the impact of
> the
> > patch in various scenarios.
> >
> > My main concern is the cache footprint of node records. Those are made
> up of
> > a list of record ids and would thus grow by a factor of 6 with the
> current
> > patch.
> >
> > Locality is not so much of concern here. I would expect it to actually
> > improve as the patch gets rid of the 255 references limit of segments. A
> > limit which in practical deployments leads to degeneration of segment
> sizes
> > (I regularly see median sizes below 5k). See OAK-2896 for some
> background on
> > this.
> > Furthermore we already did a big step forward in improving locality in
> > concurrent write scenarios when we introduced the
> SegmentBufferWriterPool.
> > In essence: thread affinity for segments.
> >
> > We should probably be more carefully looking at the micro benchmarks. I
> > guess we neglected this part a bit in the past. Unfortunately CI
> > infrastructure isn't making this easy for us... OTOH those benchmarks
> only
> > tell you so much. Many of the problems we recently faced only surfaced in
> > the large: huge repos, high concurrent load, many days of traffic.
> >
> > Michael
> >
> >
> >
> >
> >
> > On 23.7.16 12:34 , Jukka Zitting wrote:
> >>
> >> Hi,
> >>
> >> Cool! I'm pretty sure there are various ways in which the format could
> be
> >> improved, as the original design was based mostly on intuition, guided
> >> somewhat by collected stats <
> http://markmail.org/message/kxe3iy2hnodxsghe>
> >> and
> >> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
> used
> >> to optimize common operations.
> >>
> >> Note though that the total size of the repository was not and probably
> >> shouldn't be a primary metric, since the size of a typical repository is
> >> governed mostly by binaries and string properties (though it's a good
> idea
> >> to make sure you avoid things like duplicates of large binaries).
> Instead
> >> the rationale for squeezing things like record ids to as few bytes as
> >> possible is captured in the principles listed in the original design doc
> >> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
> >>
> >>    - Compactness. The formatting of records is optimized for size to
> >> reduce
> >>    IO costs and to fit as much content in caches as possible. A node
> >> stored in
> >>    SegmentNodeStore typically consumes only a fraction of the size it
> >> would as
> >>    a bundle in Jackrabbit Classic.
> >>    - Locality. Segments are written so that related records, like a node
> >>    and its immediate children, usually end up stored in the same
> segment.
> >> This
> >>    makes tree traversals very fast and avoids most cache misses for
> >> typical
> >>    clients that access more than one related node per session.
> >>
> >> Thus I would recommend keeping an eye also on benchmark results in
> >> addition
> >> to raw repository size when evaluating possible improvements. Also, the
> >> number and size of data segments are good size metrics to look at in
> >> addition to total disk usage.
> >>
> >> BR,
> >>
> >> Jukka Zitting
> >>
> >> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <
> mari.francesco@gmail.com>
> >> wrote:
> >>
> >>> The impact on repository size needs to be assessed with more specific
> >>> tests. In particular, I found RecordUsageAnalyserTest and
> >>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
> >>> these tests are usually the first to be disabled or blindly updated
> >>> every time a small fix changes the size of the records.
> >>>
> >>> Regarding GC, the segment graph could be computed during the mark
> >>> phase. Of course, it's handy to have this information pre-computed for
> >>> you, but since the record graph is traversed anyway we could think
> >>> about dynamically reconstructing the segment graph when needed.
> >>>
> >>> There are still so many questions to answer, but I think that this
> >>> simplification exercise can be worth the effort.
> >>>
> >>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> Neat! I would have expected a greater impact on the size of the
> segment
> >>>> store. But as you say it probably all depends on the binary/content
> >>>
> >>> ratio. I
> >>>>
> >>>> think we should look at the #references / repository size ratio for
> >>>> repositories of different structures and see how such a number differs
> >>>
> >>> with
> >>>>
> >>>> and without the patch.
> >>>>
> >>>> I like the patch as it fixes OAK-2896 while at the same time reducing
> >>>> complexity a lot.
> >>>>
> >>>> OTOH we need to figure out how to regain the lost functionality (e.g.
> >>>> gc)
> >>>> and asses its impact on repository size.
> >>>>
> >>>> Michael
> >>>>
> >>>>
> >>>>
> >>>> On 22.7.16 11:32 , Francesco Mari wrote:
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Yesterday I took some time for a little experiment: how many
> >>>>> optimisations can be removed from the current segment format while
> >>>>> maintaining the same functionality?
> >>>>>
> >>>>> I made some work in a branch on GitHub [1]. The code on that branch
> is
> >>>>> similar to the current trunk except for the following changes:
> >>>>>
> >>>>> 1. Record IDs are always serialised in their entirety. As such, a
> >>>>> serialised record ID occupies 18 bytes instead of 3.
> >>>>>
> >>>>> 2. Because of the previous change, the table of referenced segment
> IDs
> >>>>> is not needed anymore, so I removed it from the segment header. It
> >>>>> turns out that this table is indeed needed for the mark phase of
> >>>>> compaction, so this feature is broken in that branch.
> >>>>>
> >>>>> Anyway, since the code is in a runnable state, I generated some
> >>>>> content using the current trunk and the dumber version of
> >>>>> oak-segment-tar. This is the repository created by the dumb
> >>>>> oak-segment-tar:
> >>>>>
> >>>>> 524744 data00000a.tar
> >>>>> 524584 data00001a.tar
> >>>>> 524688 data00002a.tar
> >>>>> 460896 data00003a.tar
> >>>>> 8 journal.log
> >>>>> 0 repo.lock
> >>>>>
> >>>>> This is the one created by the current trunk:
> >>>>>
> >>>>> 524864 data00000a.tar
> >>>>> 524656 data00001a.tar
> >>>>> 524792 data00002a.tar
> >>>>> 297288 data00003a.tar
> >>>>> 8 journal.log
> >>>>> 0 repo.lock
> >>>>>
> >>>>> The process that generates the content doesn't change between the two
> >>>>> executions, and the generated content is coming from a real world
> >>>>> scenario. For those familiar with it, the content is generated by an
> >>>>> installation of Adobe Experience Manager.
> >>>>>
> >>>>> It looks like that the size of the repository is not changing so
> much.
> >>>>> Probably the de-optimisation in the small is dwarfed by the binary
> >>>>> content in the large. Another effect of my change is that there is no
> >>>>> limit on the number of referenced segment IDs per segment, and this
> >>>>> might allow segments to pack more records than before.
> >>>>>
> >>>>> Questions apart, the clear advantage of this change is a great
> >>>>> simplification of the code. I guess I can remove some lines more, but
> >>>>> what I peeled off is already a considerable amount. Look at the code!
> >>>>>
> >>>>> Francesco
> >>>>>
> >>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

I put together some statistics [1] for the process I described above.
The "dumb" variant requires more segments to store the same amount of
data, because of the increased size of serialised record IDs.  As you
can see the amount of records per segment is definitely lower in the
dumb variant.

On the other hand, ignoring the growth of segment ID reference table
seems to be a good choice. As shown from the segment size average,
dumb segments are usually fuller that their counterpart. Moreover, a
lower standard deviation shows that it's more common to have full dumb
segments.

In addition, my analysis seems to have found a bug too. There are a
lot of segments with no segment ID references and only one record,
which is very likely to be the segment info. The flush thread writes
every 5 seconds the current segment buffer, provided that the buffer
is not empty. It turns out that a segment buffer is never empty, since
it always contains at least one record. As such, we are currently
leaking almost empty segments every 5 seconds, that waste additional
space on disk because of the padding required by the TAR format.

[1]: https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing

2016-07-25 10:05 GMT+02:00 Michael Dürig <md...@apache.org>:
>
> Hi Jukka,
>
> Thanks for sharing your perspective and the historical background.
>
> I agree that repository size shouldn't be a primary concern. However, we
> have seen many repositories (especially with an external data store) where
> the content is extremely fine granular. Much more than in an initial content
> installation of CQ (which I believe was one of the initial setup for
> collecting statistics). So we should at least understand the impact of the
> patch in various scenarios.
>
> My main concern is the cache footprint of node records. Those are made up of
> a list of record ids and would thus grow by a factor of 6 with the current
> patch.
>
> Locality is not so much of concern here. I would expect it to actually
> improve as the patch gets rid of the 255 references limit of segments. A
> limit which in practical deployments leads to degeneration of segment sizes
> (I regularly see median sizes below 5k). See OAK-2896 for some background on
> this.
> Furthermore we already did a big step forward in improving locality in
> concurrent write scenarios when we introduced the SegmentBufferWriterPool.
> In essence: thread affinity for segments.
>
> We should probably be more carefully looking at the micro benchmarks. I
> guess we neglected this part a bit in the past. Unfortunately CI
> infrastructure isn't making this easy for us... OTOH those benchmarks only
> tell you so much. Many of the problems we recently faced only surfaced in
> the large: huge repos, high concurrent load, many days of traffic.
>
> Michael
>
>
>
>
>
> On 23.7.16 12:34 , Jukka Zitting wrote:
>>
>> Hi,
>>
>> Cool! I'm pretty sure there are various ways in which the format could be
>> improved, as the original design was based mostly on intuition, guided
>> somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe>
>> and
>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
>> to optimize common operations.
>>
>> Note though that the total size of the repository was not and probably
>> shouldn't be a primary metric, since the size of a typical repository is
>> governed mostly by binaries and string properties (though it's a good idea
>> to make sure you avoid things like duplicates of large binaries). Instead
>> the rationale for squeezing things like record ids to as few bytes as
>> possible is captured in the principles listed in the original design doc
>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>
>>    - Compactness. The formatting of records is optimized for size to
>> reduce
>>    IO costs and to fit as much content in caches as possible. A node
>> stored in
>>    SegmentNodeStore typically consumes only a fraction of the size it
>> would as
>>    a bundle in Jackrabbit Classic.
>>    - Locality. Segments are written so that related records, like a node
>>    and its immediate children, usually end up stored in the same segment.
>> This
>>    makes tree traversals very fast and avoids most cache misses for
>> typical
>>    clients that access more than one related node per session.
>>
>> Thus I would recommend keeping an eye also on benchmark results in
>> addition
>> to raw repository size when evaluating possible improvements. Also, the
>> number and size of data segments are good size metrics to look at in
>> addition to total disk usage.
>>
>> BR,
>>
>> Jukka Zitting
>>
>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <ma...@gmail.com>
>> wrote:
>>
>>> The impact on repository size needs to be assessed with more specific
>>> tests. In particular, I found RecordUsageAnalyserTest and
>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>> these tests are usually the first to be disabled or blindly updated
>>> every time a small fix changes the size of the records.
>>>
>>> Regarding GC, the segment graph could be computed during the mark
>>> phase. Of course, it's handy to have this information pre-computed for
>>> you, but since the record graph is traversed anyway we could think
>>> about dynamically reconstructing the segment graph when needed.
>>>
>>> There are still so many questions to answer, but I think that this
>>> simplification exercise can be worth the effort.
>>>
>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Neat! I would have expected a greater impact on the size of the segment
>>>> store. But as you say it probably all depends on the binary/content
>>>
>>> ratio. I
>>>>
>>>> think we should look at the #references / repository size ratio for
>>>> repositories of different structures and see how such a number differs
>>>
>>> with
>>>>
>>>> and without the patch.
>>>>
>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>> complexity a lot.
>>>>
>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>> gc)
>>>> and asses its impact on repository size.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Yesterday I took some time for a little experiment: how many
>>>>> optimisations can be removed from the current segment format while
>>>>> maintaining the same functionality?
>>>>>
>>>>> I made some work in a branch on GitHub [1]. The code on that branch is
>>>>> similar to the current trunk except for the following changes:
>>>>>
>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>
>>>>> 2. Because of the previous change, the table of referenced segment IDs
>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>> turns out that this table is indeed needed for the mark phase of
>>>>> compaction, so this feature is broken in that branch.
>>>>>
>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>> content using the current trunk and the dumber version of
>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>> oak-segment-tar:
>>>>>
>>>>> 524744 data00000a.tar
>>>>> 524584 data00001a.tar
>>>>> 524688 data00002a.tar
>>>>> 460896 data00003a.tar
>>>>> 8 journal.log
>>>>> 0 repo.lock
>>>>>
>>>>> This is the one created by the current trunk:
>>>>>
>>>>> 524864 data00000a.tar
>>>>> 524656 data00001a.tar
>>>>> 524792 data00002a.tar
>>>>> 297288 data00003a.tar
>>>>> 8 journal.log
>>>>> 0 repo.lock
>>>>>
>>>>> The process that generates the content doesn't change between the two
>>>>> executions, and the generated content is coming from a real world
>>>>> scenario. For those familiar with it, the content is generated by an
>>>>> installation of Adobe Experience Manager.
>>>>>
>>>>> It looks like that the size of the repository is not changing so much.
>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>> content in the large. Another effect of my change is that there is no
>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>> might allow segments to pack more records than before.
>>>>>
>>>>> Questions apart, the clear advantage of this change is a great
>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>
>>>>> Francesco
>>>>>
>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>
>>>>
>>>
>>
>

Re: Are dumb segments dumb?

Posted by Michael Dürig <md...@apache.org>.

Hi Jukka,

Thanks for sharing your perspective and the historical background.

I agree that repository size shouldn't be a primary concern. However, we 
have seen many repositories (especially with an external data store) 
where the content is extremely fine granular. Much more than in an 
initial content installation of CQ (which I believe was one of the 
initial setup for collecting statistics). So we should at least 
understand the impact of the patch in various scenarios.

My main concern is the cache footprint of node records. Those are made 
up of a list of record ids and would thus grow by a factor of 6 with the 
current patch.

Locality is not so much of concern here. I would expect it to actually 
improve as the patch gets rid of the 255 references limit of segments. A 
limit which in practical deployments leads to degeneration of segment 
sizes (I regularly see median sizes below 5k). See OAK-2896 for some 
background on this.
Furthermore we already did a big step forward in improving locality in 
concurrent write scenarios when we introduced the 
SegmentBufferWriterPool. In essence: thread affinity for segments.

We should probably be more carefully looking at the micro benchmarks. I 
guess we neglected this part a bit in the past. Unfortunately CI 
infrastructure isn't making this easy for us... OTOH those benchmarks 
only tell you so much. Many of the problems we recently faced only 
surfaced in the large: huge repos, high concurrent load, many days of 
traffic.

Michael




On 23.7.16 12:34 , Jukka Zitting wrote:
> Hi,
>
> Cool! I'm pretty sure there are various ways in which the format could be
> improved, as the original design was based mostly on intuition, guided
> somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe> and
> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
> to optimize common operations.
>
> Note though that the total size of the repository was not and probably
> shouldn't be a primary metric, since the size of a typical repository is
> governed mostly by binaries and string properties (though it's a good idea
> to make sure you avoid things like duplicates of large binaries). Instead
> the rationale for squeezing things like record ids to as few bytes as
> possible is captured in the principles listed in the original design doc
> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>
>    - Compactness. The formatting of records is optimized for size to reduce
>    IO costs and to fit as much content in caches as possible. A node stored in
>    SegmentNodeStore typically consumes only a fraction of the size it would as
>    a bundle in Jackrabbit Classic.
>    - Locality. Segments are written so that related records, like a node
>    and its immediate children, usually end up stored in the same segment. This
>    makes tree traversals very fast and avoids most cache misses for typical
>    clients that access more than one related node per session.
>
> Thus I would recommend keeping an eye also on benchmark results in addition
> to raw repository size when evaluating possible improvements. Also, the
> number and size of data segments are good size metrics to look at in
> addition to total disk usage.
>
> BR,
>
> Jukka Zitting
>
> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <ma...@gmail.com>
> wrote:
>
>> The impact on repository size needs to be assessed with more specific
>> tests. In particular, I found RecordUsageAnalyserTest and
>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>> these tests are usually the first to be disabled or blindly updated
>> every time a small fix changes the size of the records.
>>
>> Regarding GC, the segment graph could be computed during the mark
>> phase. Of course, it's handy to have this information pre-computed for
>> you, but since the record graph is traversed anyway we could think
>> about dynamically reconstructing the segment graph when needed.
>>
>> There are still so many questions to answer, but I think that this
>> simplification exercise can be worth the effort.
>>
>> 2016-07-22 11:34 GMT+02:00 Michael D�rig <md...@apache.org>:
>>>
>>> Hi,
>>>
>>> Neat! I would have expected a greater impact on the size of the segment
>>> store. But as you say it probably all depends on the binary/content
>> ratio. I
>>> think we should look at the #references / repository size ratio for
>>> repositories of different structures and see how such a number differs
>> with
>>> and without the patch.
>>>
>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>> complexity a lot.
>>>
>>> OTOH we need to figure out how to regain the lost functionality (e.g. gc)
>>> and asses its impact on repository size.
>>>
>>> Michael
>>>
>>>
>>>
>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>
>>>> Hi,
>>>>
>>>> Yesterday I took some time for a little experiment: how many
>>>> optimisations can be removed from the current segment format while
>>>> maintaining the same functionality?
>>>>
>>>> I made some work in a branch on GitHub [1]. The code on that branch is
>>>> similar to the current trunk except for the following changes:
>>>>
>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>
>>>> 2. Because of the previous change, the table of referenced segment IDs
>>>> is not needed anymore, so I removed it from the segment header. It
>>>> turns out that this table is indeed needed for the mark phase of
>>>> compaction, so this feature is broken in that branch.
>>>>
>>>> Anyway, since the code is in a runnable state, I generated some
>>>> content using the current trunk and the dumber version of
>>>> oak-segment-tar. This is the repository created by the dumb
>>>> oak-segment-tar:
>>>>
>>>> 524744 data00000a.tar
>>>> 524584 data00001a.tar
>>>> 524688 data00002a.tar
>>>> 460896 data00003a.tar
>>>> 8 journal.log
>>>> 0 repo.lock
>>>>
>>>> This is the one created by the current trunk:
>>>>
>>>> 524864 data00000a.tar
>>>> 524656 data00001a.tar
>>>> 524792 data00002a.tar
>>>> 297288 data00003a.tar
>>>> 8 journal.log
>>>> 0 repo.lock
>>>>
>>>> The process that generates the content doesn't change between the two
>>>> executions, and the generated content is coming from a real world
>>>> scenario. For those familiar with it, the content is generated by an
>>>> installation of Adobe Experience Manager.
>>>>
>>>> It looks like that the size of the repository is not changing so much.
>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>> content in the large. Another effect of my change is that there is no
>>>> limit on the number of referenced segment IDs per segment, and this
>>>> might allow segments to pack more records than before.
>>>>
>>>> Questions apart, the clear advantage of this change is a great
>>>> simplification of the code. I guess I can remove some lines more, but
>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>
>>>> Francesco
>>>>
>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>
>>>
>>
>

Re: Are dumb segments dumb?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

Cool! I'm pretty sure there are various ways in which the format could be
improved, as the original design was based mostly on intuition, guided
somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe> and
the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
to optimize common operations.

Note though that the total size of the repository was not and probably
shouldn't be a primary metric, since the size of a typical repository is
governed mostly by binaries and string properties (though it's a good idea
to make sure you avoid things like duplicates of large binaries). Instead
the rationale for squeezing things like record ids to as few bytes as
possible is captured in the principles listed in the original design doc
<http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:

   - Compactness. The formatting of records is optimized for size to reduce
   IO costs and to fit as much content in caches as possible. A node stored in
   SegmentNodeStore typically consumes only a fraction of the size it would as
   a bundle in Jackrabbit Classic.
   - Locality. Segments are written so that related records, like a node
   and its immediate children, usually end up stored in the same segment. This
   makes tree traversals very fast and avoids most cache misses for typical
   clients that access more than one related node per session.

Thus I would recommend keeping an eye also on benchmark results in addition
to raw repository size when evaluating possible improvements. Also, the
number and size of data segments are good size metrics to look at in
addition to total disk usage.

BR,

Jukka Zitting

On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <ma...@gmail.com>
wrote:

> The impact on repository size needs to be assessed with more specific
> tests. In particular, I found RecordUsageAnalyserTest and
> SegmentSizeTest unsuitable to this task. It's not a coincidence that
> these tests are usually the first to be disabled or blindly updated
> every time a small fix changes the size of the records.
>
> Regarding GC, the segment graph could be computed during the mark
> phase. Of course, it's handy to have this information pre-computed for
> you, but since the record graph is traversed anyway we could think
> about dynamically reconstructing the segment graph when needed.
>
> There are still so many questions to answer, but I think that this
> simplification exercise can be worth the effort.
>
> 2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
> >
> > Hi,
> >
> > Neat! I would have expected a greater impact on the size of the segment
> > store. But as you say it probably all depends on the binary/content
> ratio. I
> > think we should look at the #references / repository size ratio for
> > repositories of different structures and see how such a number differs
> with
> > and without the patch.
> >
> > I like the patch as it fixes OAK-2896 while at the same time reducing
> > complexity a lot.
> >
> > OTOH we need to figure out how to regain the lost functionality (e.g. gc)
> > and asses its impact on repository size.
> >
> > Michael
> >
> >
> >
> > On 22.7.16 11:32 , Francesco Mari wrote:
> >>
> >> Hi,
> >>
> >> Yesterday I took some time for a little experiment: how many
> >> optimisations can be removed from the current segment format while
> >> maintaining the same functionality?
> >>
> >> I made some work in a branch on GitHub [1]. The code on that branch is
> >> similar to the current trunk except for the following changes:
> >>
> >> 1. Record IDs are always serialised in their entirety. As such, a
> >> serialised record ID occupies 18 bytes instead of 3.
> >>
> >> 2. Because of the previous change, the table of referenced segment IDs
> >> is not needed anymore, so I removed it from the segment header. It
> >> turns out that this table is indeed needed for the mark phase of
> >> compaction, so this feature is broken in that branch.
> >>
> >> Anyway, since the code is in a runnable state, I generated some
> >> content using the current trunk and the dumber version of
> >> oak-segment-tar. This is the repository created by the dumb
> >> oak-segment-tar:
> >>
> >> 524744 data00000a.tar
> >> 524584 data00001a.tar
> >> 524688 data00002a.tar
> >> 460896 data00003a.tar
> >> 8 journal.log
> >> 0 repo.lock
> >>
> >> This is the one created by the current trunk:
> >>
> >> 524864 data00000a.tar
> >> 524656 data00001a.tar
> >> 524792 data00002a.tar
> >> 297288 data00003a.tar
> >> 8 journal.log
> >> 0 repo.lock
> >>
> >> The process that generates the content doesn't change between the two
> >> executions, and the generated content is coming from a real world
> >> scenario. For those familiar with it, the content is generated by an
> >> installation of Adobe Experience Manager.
> >>
> >> It looks like that the size of the repository is not changing so much.
> >> Probably the de-optimisation in the small is dwarfed by the binary
> >> content in the large. Another effect of my change is that there is no
> >> limit on the number of referenced segment IDs per segment, and this
> >> might allow segments to pack more records than before.
> >>
> >> Questions apart, the clear advantage of this change is a great
> >> simplification of the code. I guess I can remove some lines more, but
> >> what I peeled off is already a considerable amount. Look at the code!
> >>
> >> Francesco
> >>
> >> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
> >>
> >
>

Re: Are dumb segments dumb?

Posted by Francesco Mari <ma...@gmail.com>.

The impact on repository size needs to be assessed with more specific
tests. In particular, I found RecordUsageAnalyserTest and
SegmentSizeTest unsuitable to this task. It's not a coincidence that
these tests are usually the first to be disabled or blindly updated
every time a small fix changes the size of the records.

Regarding GC, the segment graph could be computed during the mark
phase. Of course, it's handy to have this information pre-computed for
you, but since the record graph is traversed anyway we could think
about dynamically reconstructing the segment graph when needed.

There are still so many questions to answer, but I think that this
simplification exercise can be worth the effort.

2016-07-22 11:34 GMT+02:00 Michael Dürig <md...@apache.org>:
>
> Hi,
>
> Neat! I would have expected a greater impact on the size of the segment
> store. But as you say it probably all depends on the binary/content ratio. I
> think we should look at the #references / repository size ratio for
> repositories of different structures and see how such a number differs with
> and without the patch.
>
> I like the patch as it fixes OAK-2896 while at the same time reducing
> complexity a lot.
>
> OTOH we need to figure out how to regain the lost functionality (e.g. gc)
> and asses its impact on repository size.
>
> Michael
>
>
>
> On 22.7.16 11:32 , Francesco Mari wrote:
>>
>> Hi,
>>
>> Yesterday I took some time for a little experiment: how many
>> optimisations can be removed from the current segment format while
>> maintaining the same functionality?
>>
>> I made some work in a branch on GitHub [1]. The code on that branch is
>> similar to the current trunk except for the following changes:
>>
>> 1. Record IDs are always serialised in their entirety. As such, a
>> serialised record ID occupies 18 bytes instead of 3.
>>
>> 2. Because of the previous change, the table of referenced segment IDs
>> is not needed anymore, so I removed it from the segment header. It
>> turns out that this table is indeed needed for the mark phase of
>> compaction, so this feature is broken in that branch.
>>
>> Anyway, since the code is in a runnable state, I generated some
>> content using the current trunk and the dumber version of
>> oak-segment-tar. This is the repository created by the dumb
>> oak-segment-tar:
>>
>> 524744 data00000a.tar
>> 524584 data00001a.tar
>> 524688 data00002a.tar
>> 460896 data00003a.tar
>> 8 journal.log
>> 0 repo.lock
>>
>> This is the one created by the current trunk:
>>
>> 524864 data00000a.tar
>> 524656 data00001a.tar
>> 524792 data00002a.tar
>> 297288 data00003a.tar
>> 8 journal.log
>> 0 repo.lock
>>
>> The process that generates the content doesn't change between the two
>> executions, and the generated content is coming from a real world
>> scenario. For those familiar with it, the content is generated by an
>> installation of Adobe Experience Manager.
>>
>> It looks like that the size of the repository is not changing so much.
>> Probably the de-optimisation in the small is dwarfed by the binary
>> content in the large. Another effect of my change is that there is no
>> limit on the number of referenced segment IDs per segment, and this
>> might allow segments to pack more records than before.
>>
>> Questions apart, the clear advantage of this change is a great
>> simplification of the code. I guess I can remove some lines more, but
>> what I peeled off is already a considerable amount. Look at the code!
>>
>> Francesco
>>
>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>
>

Re: Are dumb segments dumb?

Posted by Michael Dürig <md...@apache.org>.

Hi,

Neat! I would have expected a greater impact on the size of the segment 
store. But as you say it probably all depends on the binary/content 
ratio. I think we should look at the #references / repository size ratio 
for repositories of different structures and see how such a number 
differs with and without the patch.

I like the patch as it fixes OAK-2896 while at the same time reducing 
complexity a lot.

OTOH we need to figure out how to regain the lost functionality (e.g. 
gc) and asses its impact on repository size.

Michael


On 22.7.16 11:32 , Francesco Mari wrote:
> Hi,
>
> Yesterday I took some time for a little experiment: how many
> optimisations can be removed from the current segment format while
> maintaining the same functionality?
>
> I made some work in a branch on GitHub [1]. The code on that branch is
> similar to the current trunk except for the following changes:
>
> 1. Record IDs are always serialised in their entirety. As such, a
> serialised record ID occupies 18 bytes instead of 3.
>
> 2. Because of the previous change, the table of referenced segment IDs
> is not needed anymore, so I removed it from the segment header. It
> turns out that this table is indeed needed for the mark phase of
> compaction, so this feature is broken in that branch.
>
> Anyway, since the code is in a runnable state, I generated some
> content using the current trunk and the dumber version of
> oak-segment-tar. This is the repository created by the dumb
> oak-segment-tar:
>
> 524744 data00000a.tar
> 524584 data00001a.tar
> 524688 data00002a.tar
> 460896 data00003a.tar
> 8 journal.log
> 0 repo.lock
>
> This is the one created by the current trunk:
>
> 524864 data00000a.tar
> 524656 data00001a.tar
> 524792 data00002a.tar
> 297288 data00003a.tar
> 8 journal.log
> 0 repo.lock
>
> The process that generates the content doesn't change between the two
> executions, and the generated content is coming from a real world
> scenario. For those familiar with it, the content is generated by an
> installation of Adobe Experience Manager.
>
> It looks like that the size of the repository is not changing so much.
> Probably the de-optimisation in the small is dwarfed by the binary
> content in the large. Another effect of my change is that there is no
> limit on the number of referenced segment IDs per segment, and this
> might allow segments to pack more records than before.
>
> Questions apart, the clear advantage of this change is a great
> simplification of the code. I guess I can remove some lines more, but
> what I peeled off is already a considerable amount. Look at the code!
>
> Francesco
>
> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>