You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2013/02/25 16:24:04 UTC

Large flat commit problems

Hi,

Two of our goals for Oak are support for large transactions and for
flat hierarchies. I combined these two goals into a simple benchmark
that tries to import the contents of a Wikipedia dump into an Oak
repository using just a single save() call.

Here are some initial numbers using the fairly small Faroese
wikipedia, with just some 12k pages.

The default H2 MK starts to slow down after 5k transient nodes and
fails after 6k:

$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
      WikipediaImport Oak-Default
Apache Jackrabbit Oak 0.7-SNAPSHOT
Wikipedia import (fowiki-20130213-pages-articles.xml)
Oak-Default: importing Wikipedia...
Imported 1000 pages in 1 seconds (1271us/page)
Imported 2000 pages in 2 seconds (1465us/page)
Imported 3000 pages in 4 seconds (1475us/page)
Imported 4000 pages in 6 seconds (1749us/page)
Imported 5000 pages in 11 seconds (2219us/page)
Imported 6000 pages in 28 seconds (4815us/page)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The new MongoMK prototype fails already sooner:

$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
      WikipediaImport Oak-Mongo
Apache Jackrabbit Oak 0.7-SNAPSHOT
Wikipedia import (fowiki-20130213-pages-articles.xml)
Oak-Mongo: importing Wikipedia...
Imported 1000 pages in 1 seconds (1949us/page)
Imported 2000 pages in 6 seconds (3260us/page)
Imported 3000 pages in 13 seconds (4523us/page)
Imported 4000 pages in 30 seconds (7613us/page)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

After my recent work on OAK-632 the SegmentMK does better, but it also
experiences some slowdown over time:

$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
      WikipediaImport Oak-Segment
Apache Jackrabbit Oak 0.7-SNAPSHOT
Wikipedia import (fowiki-20130213-pages-articles.xml)
Oak-Segment: importing Wikipedia...
Imported 1000 pages in 1 seconds (1419us/page)
Imported 2000 pages in 2 seconds (1447us/page)
Imported 3000 pages in 4 seconds (1492us/page)
Imported 4000 pages in 6 seconds (1586us/page)
Imported 5000 pages in 8 seconds (1697us/page)
Imported 6000 pages in 10 seconds (1812us/page)
Imported 7000 pages in 13 seconds (1927us/page)
Imported 8000 pages in 16 seconds (2042us/page)
Imported 9000 pages in 19 seconds (2146us/page)
Imported 10000 pages in 22 seconds (2254us/page)
Imported 11000 pages in 25 seconds (2355us/page)
Imported 12000 pages in 29 seconds (2462us/page)
Imported 12148 pages in 41 seconds (3375us/page)

To summarize, all MKs still need some work on this. Once these initial
problems are solved, we can try the same benchmark with larger
Wikipedias.

PS. Note that I'm using the OAK-652 feature flag to speed things up on
the oak-jcr level.

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Apr 29, 2013 at 10:17 AM, Lukas Eder <ma...@adobe.com> wrote:
> Do comparisons with Jackrabbit exist?

Not for this particular benchmark, since Jackrabbit 2.x is unable to
deal with such a large transaction (~400MB of non-binary content) or
the flat hierarchy (170k child nodes).

Some Jackrabbit comparisons are included in the other benchmark thread.

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Apr 26, 2013 at 3:15 PM, Jukka Zitting <ju...@gmail.com> wrote:
>     Imported 171382 pages in 355 seconds (2.07ms/page)

For comparison, here's the result if I change the benchmark code to
call save() after every 1k pages:

    Imported 171382 pages in 1154 seconds (6.74ms/page)

BR,

Jukka Zitting

RE: Large flat commit problems

Posted by Marcel Reutegger <mr...@adobe.com>.
> And with the TarMK:
> 
>     Added 171000 pages in 36 seconds (0.21ms/page)
>     Imported 171382 pages in 54 seconds (0.32ms/page)
>     [...]
>     Traversed 171382 pages in 2 seconds (0.01ms/page)
> 
> I particularly like that last line. :-)

very nice!

regards
 marcel

Re: Large flat commit problems

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Great! If we continue like this, the TarMK will next month have negative
values :-)

Regards,
Thomas


On 5/30/13 2:45 PM, "Jukka Zitting" <ju...@gmail.com> wrote:

>Hi,
>
>On Fri, Apr 26, 2013 at 3:15 PM, Jukka Zitting <ju...@gmail.com>
>wrote:
>> On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting
>><ju...@gmail.com> wrote:
>>>     Added 167000 pages in 467 seconds (2.80ms/page)
>>>     Imported 167404 pages in 1799 seconds (10.75ms/page)
>>
>>     Added 171000 pages in 166 seconds (0.97ms/page)
>>     Imported 171382 pages in 355 seconds (2.07ms/page)
>>     [...]
>>     Traversed 171382 pages in 27 seconds (0.16ms/page)
>>
>> Pretty good progress here.
>
>Getting better still:
>
>    Added 171000 pages in 61 seconds (0.36ms/page)
>    Imported 171382 pages in 194 seconds (1.14ms/page)
>    [...]
>    Traversed 171382 pages in 26 seconds (0.16ms/page)
>
>And with the TarMK:
>
>    Added 171000 pages in 36 seconds (0.21ms/page)
>    Imported 171382 pages in 54 seconds (0.32ms/page)
>    [...]
>    Traversed 171382 pages in 2 seconds (0.01ms/page)
>
>I particularly like that last line. :-)
>
>BR,
>
>Jukka Zitting


Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Apr 26, 2013 at 3:15 PM, Jukka Zitting <ju...@gmail.com> wrote:
> On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <ju...@gmail.com> wrote:
>>     Added 167000 pages in 467 seconds (2.80ms/page)
>>     Imported 167404 pages in 1799 seconds (10.75ms/page)
>
>     Added 171000 pages in 166 seconds (0.97ms/page)
>     Imported 171382 pages in 355 seconds (2.07ms/page)
>     [...]
>     Traversed 171382 pages in 27 seconds (0.16ms/page)
>
> Pretty good progress here.

Getting better still:

    Added 171000 pages in 61 seconds (0.36ms/page)
    Imported 171382 pages in 194 seconds (1.14ms/page)
    [...]
    Traversed 171382 pages in 26 seconds (0.16ms/page)

And with the TarMK:

    Added 171000 pages in 36 seconds (0.21ms/page)
    Imported 171382 pages in 54 seconds (0.32ms/page)
    [...]
    Traversed 171382 pages in 2 seconds (0.01ms/page)

I particularly like that last line. :-)

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Lukas Eder <ma...@adobe.com>.
Hi,

On 4/26/13 2:15 PM, "Jukka Zitting" <ju...@gmail.com> wrote:

>Hi,
>
>On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <ju...@gmail.com>
>wrote:
>>     Added 167000 pages in 467 seconds (2.80ms/page)
>>     Imported 167404 pages in 1799 seconds (10.75ms/page)
>
>Here's an update on the latest status with the Wikipedia import benchmark:
>
>    $ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
>          benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \
>          --cache=200  WikipediaImport Oak-Segment
>    [...]
>    Added 171000 pages in 166 seconds (0.97ms/page)
>    Imported 171382 pages in 355 seconds (2.07ms/page)
>    [...]
>    Traversed 171382 pages in 27 seconds (0.16ms/page)
>
>Pretty good progress here.

Those are impressive numbers. Do comparisons with Jackrabbit exist?

Cheers
Lukas

>> There are still a few problems, most notably the fact the index update
>> hook operates directly on the plain MemoryNodeBuilder used by the
>> current SegmentMK, so it won't benefit from the automatic purging of
>> large change-sets and thus ends up requiring lots of memory during the
>> massive final save() call. Something like a SegmentNodeBuilder with
>> similar internal purge logic like what we already prototyped in
>> KernelNodeState should solve that issue.
>
>This is still an issue, see the -Xmx1500m I used for the import.
>
>> The other big issue is the large amount of time spent processing the
>> commit hooks. The one hook approach I outlined earlier should help us
>> there.
>
>The work we've done here with the Editor mechanism is clearly paying
>off as the commit hooks are now taking some 53% of the import time,
>down from 74% two months ago, even when we've been adding more
>functionality there.
>
>BR,
>
>Jukka Zitting


Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <ju...@gmail.com> wrote:
>     Added 167000 pages in 467 seconds (2.80ms/page)
>     Imported 167404 pages in 1799 seconds (10.75ms/page)

Here's an update on the latest status with the Wikipedia import benchmark:

    $ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
          benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \
          --cache=200  WikipediaImport Oak-Segment
    [...]
    Added 171000 pages in 166 seconds (0.97ms/page)
    Imported 171382 pages in 355 seconds (2.07ms/page)
    [...]
    Traversed 171382 pages in 27 seconds (0.16ms/page)

Pretty good progress here.

> There are still a few problems, most notably the fact the index update
> hook operates directly on the plain MemoryNodeBuilder used by the
> current SegmentMK, so it won't benefit from the automatic purging of
> large change-sets and thus ends up requiring lots of memory during the
> massive final save() call. Something like a SegmentNodeBuilder with
> similar internal purge logic like what we already prototyped in
> KernelNodeState should solve that issue.

This is still an issue, see the -Xmx1500m I used for the import.

> The other big issue is the large amount of time spent processing the
> commit hooks. The one hook approach I outlined earlier should help us
> there.

The work we've done here with the Editor mechanism is clearly paying
off as the commit hooks are now taking some 53% of the import time,
down from 74% two months ago, even when we've been adding more
functionality there.

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Feb 26, 2013 at 2:19 PM, Chetan Mehrotra
<ch...@gmail.com> wrote:
> I modified the importer logic to use a custom nodeType similar to
> SlingFolder (no orderable nodes) and following are the results

Thanks! It indeed looks like the orderability is an issue here. With
the oak:unstructured type I added in OAK-657 and a few more
improvements and fixes to the SegmentMK I can now import also the
Simplified English wiki, with 167k pages:

    $ java -Xmx500m -DOAK-652=true -jar
oak-run/target/oak-run-0.7-SNAPSHOT.jar \
          benchmark --wikipedia=simplewiki-20130214-pages-articles.xml
--cache=200 \
          WikipediaImport Oak-Segment
    Apache Jackrabbit Oak 0.7-SNAPSHOT
    Oak-Segment: Wikipedia import benchmark
    Importing simplewiki-20130214-pages-articles.xml...
    Added 1000 pages in 1 seconds (1.35ms/page)
    [...]
    Added 167000 pages in 467 seconds (2.80ms/page)
    Imported 167404 pages in 1799 seconds (10.75ms/page)

The speed of transient operations slows down slightly over time mostly
since initially everything is cached and later cache misses become
more frequent. Note the new --cache option that can be used to control
the size (in MB) of the segment cache. Ideally, for better comparison,
we'd also make it control the cache used by the MongoMK.

There are still a few problems, most notably the fact the index update
hook operates directly on the plain MemoryNodeBuilder used by the
current SegmentMK, so it won't benefit from the automatic purging of
large change-sets and thus ends up requiring lots of memory during the
massive final save() call. Something like a SegmentNodeBuilder with
similar internal purge logic like what we already prototyped in
KernelNodeState should solve that issue.

The other big issue is the large amount of time spent processing the
commit hooks. The one hook approach I outlined earlier should help us
there.

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Chetan Mehrotra <ch...@gmail.com>.
I modified the importer logic to use a custom nodeType similar to
SlingFolder (no orderable nodes) and following are the results

Segment MK
------------------

05:30:31 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar
oak-run/target/oak-run-0.7-SNAPSHOT.jar  benchmark
--wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml
--port=27018 WikipediaImport Oak-Segment
Apache Jackrabbit Oak 0.7-SNAPSHOT
Oak-Segment: importing
/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml...
Added 1000 pages in 6 seconds (6.34ms/page)
Added 2000 pages in 8 seconds (4.45ms/page)
Added 3000 pages in 11 seconds (3.67ms/page)
Added 4000 pages in 13 seconds (3.29ms/page)
Added 5000 pages in 15 seconds (3.04ms/page)
Added 6000 pages in 17 seconds (2.88ms/page)
Added 7000 pages in 19 seconds (2.81ms/page)
Added 8000 pages in 22 seconds (2.77ms/page)
Added 9000 pages in 24 seconds (2.76ms/page)
Added 10000 pages in 27 seconds (2.75ms/page)
Added 11000 pages in 30 seconds (2.75ms/page)
Added 12000 pages in 32 seconds (2.69ms/page)
Imported 12148 pages in 86 seconds (7.14ms/page)

Mongo MK
----------------

05:32:21 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar
oak-run/target/oak-run-0.7-SNAPSHOT.jar  benchmark
--wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml
--port=27018 WikipediaImport Oak-Mongo
Apache Jackrabbit Oak 0.7-SNAPSHOT
Oak-Mongo: importing
/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml...
Added 1000 pages in 4 seconds (4.84ms/page)
Added 2000 pages in 7 seconds (3.53ms/page)
Added 3000 pages in 9 seconds (3.33ms/page)
Added 4000 pages in 12 seconds (3.14ms/page)
Added 5000 pages in 14 seconds (2.93ms/page)
Added 6000 pages in 18 seconds (3.02ms/page)
Added 7000 pages in 22 seconds (3.16ms/page)
Added 8000 pages in 26 seconds (3.33ms/page)
Added 9000 pages in 29 seconds (3.30ms/page)
Added 10000 pages in 34 seconds (3.49ms/page)
Added 11000 pages in 53 seconds (4.88ms/page)
Added 12000 pages in 70 seconds (5.84ms/page)
Imported 12148 pages in 72 seconds (5.99ms/page)


This includes some cache related changes done by Thomas today. Both the
test pass with no OOM.

Further with nt:unstructured nodes I was getting error with MongoMK around
Document size exceeding the limit which I think was due to keeping multiple
revisioned copies of :childOrder array. This would be addressed going
forward by moving older revision to separate node or removing them alltoger
if possible

Chetan Mehrotra


On Tue, Feb 26, 2013 at 4:12 PM, Marcel Reutegger <mr...@adobe.com>wrote:

> > I didn't analyze the results, but could the problem be orderable child
> > nodes? Currently, oak-core stores a property ":childOrder".
>
> no, the problem is how oak-core detects changes between two node
> state revisions. for a node with many child nodes in two revisions,
> oak-core
> currently loads all children in both revisions to find out, which child
> nodes
> were added, removed, changed or didn't change at all.
>
> I'm currently working on this issue in KernelNodeState by leveraging
> the MK.diff(). right now it simply checks if there are differences, but
> doesn't make use of the information. this should bring the cost down
> to O(N) where N is the number of modified child nodes.
>
> Please note this requires a correct implementation of MK.diff()!
>
> Regards
>  Marcel
>

RE: Large flat commit problems

Posted by Marcel Reutegger <mr...@adobe.com>.
> I didn't analyze the results, but could the problem be orderable child
> nodes? Currently, oak-core stores a property ":childOrder".

no, the problem is how oak-core detects changes between two node
state revisions. for a node with many child nodes in two revisions, oak-core
currently loads all children in both revisions to find out, which child nodes
were added, removed, changed or didn't change at all.

I'm currently working on this issue in KernelNodeState by leveraging
the MK.diff(). right now it simply checks if there are differences, but
doesn't make use of the information. this should bring the cost down
to O(N) where N is the number of modified child nodes.

Please note this requires a correct implementation of MK.diff()! 

Regards
 Marcel

Re: Large flat commit problems

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

I created OAK-656 "Large number of child nodes not working well with
orderable node types".

Until this is fixed, I guess we could use nt:folder (which is unordered).

Regards,
Thomas


On 2/26/13 11:22 AM, "Tommaso Teofili" <te...@adobe.com> wrote:

>
>On 26/feb/2013, at 11:12, Jukka Zitting wrote:
>
>> Hi,
>> 
>> On Tue, Feb 26, 2013 at 12:04 PM, Thomas Mueller <mu...@adobe.com>
>>wrote:
>>> Large transactions: I think we didn't define this as a strict
>>>requirement.
>> 
>> It's probably not the most important thing for Oak to achieve, but we
>> did list it as a goal in
>> 
>>http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackr
>>abbit%203:
>> 
>> * Big transactions (> 100k nodes at 1kB each)
>
>I agree it's important, especially for future evaluation of Oak by
>newcomers these are common metrics.
>
>> 
>>> I didn't analyze the results, but could the problem be orderable child
>>> nodes?
>> 
>> That may well be, in the benchmark code I don't explicitly specify a
>> non-orderable node type so it defaults to the orderable
>> nt:unstructured.
>
>since the slowing trend is common, even if different, between the MK
>implementations maybe it's something also related to data structures
>holding stuff in memory.
>In my opinion it'd be good to inspect further in order to catch this sort
>of things as earliest as possible.
>
>Tommaso
>
>> 
>> BR,
>> 
>> Jukka Zitting
>


Re: Large flat commit problems

Posted by Tommaso Teofili <te...@adobe.com>.
On 26/feb/2013, at 11:12, Jukka Zitting wrote:

> Hi,
> 
> On Tue, Feb 26, 2013 at 12:04 PM, Thomas Mueller <mu...@adobe.com> wrote:
>> Large transactions: I think we didn't define this as a strict requirement.
> 
> It's probably not the most important thing for Oak to achieve, but we
> did list it as a goal in
> http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203:
> 
> * Big transactions (> 100k nodes at 1kB each)

I agree it's important, especially for future evaluation of Oak by newcomers these are common metrics.

> 
>> I didn't analyze the results, but could the problem be orderable child
>> nodes?
> 
> That may well be, in the benchmark code I don't explicitly specify a
> non-orderable node type so it defaults to the orderable
> nt:unstructured.

since the slowing trend is common, even if different, between the MK implementations maybe it's something also related to data structures holding stuff in memory.
In my opinion it'd be good to inspect further in order to catch this sort of things as earliest as possible.

Tommaso

> 
> BR,
> 
> Jukka Zitting


RE: Large flat commit problems

Posted by Marcel Reutegger <mr...@adobe.com>.
> > I didn't analyze the results, but could the problem be orderable child
> > nodes?
> 
> That may well be, in the benchmark code I don't explicitly specify a
> non-orderable node type so it defaults to the orderable
> nt:unstructured.

I think we should make a compromise here. either you get orderable child
nodes or efficient flat hierarchies, but not both.

regards
 marcel

Re: Large flat commit problems

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Feb 26, 2013 at 12:04 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Large transactions: I think we didn't define this as a strict requirement.

It's probably not the most important thing for Oak to achieve, but we
did list it as a goal in
http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203:

* Big transactions (> 100k nodes at 1kB each)

> I didn't analyze the results, but could the problem be orderable child
> nodes?

That may well be, in the benchmark code I don't explicitly specify a
non-orderable node type so it defaults to the orderable
nt:unstructured.

BR,

Jukka Zitting

Re: Large flat commit problems

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Large transactions: I think we didn't define this as a strict requirement.
I'm not aware we got into big troubles with Jackrabbit 2.x where this is
not supported. For me, this is still a nice to have. But of course it's
something we should test and try to achieve (and resolve problems if we
find any).

Flat hierarchies: Yes this is important (we ran into this problem many
times).

I didn't analyze the results, but could the problem be orderable child
nodes? Currently, oak-core stores a property ":childOrder". If there are
many child nodes, then this property gets larger and larger. This is a
problem, as it consumes more and more disk space / network bandwidth /
cpu, of the order n^2. It's the same problem as with storing the list of
children in the node bundle. So I guess this needs to be solved in
oak-core (not in each MK separately)?

Regards,
Thomas









 I combined these two goals into a simple benchmark
>that tries to import the contents of a Wikipedia dump into an Oak
>repository using just a single save() call.
>
>Here are some initial numbers using the fairly small Faroese
>wikipedia, with just some 12k pages.
>
>The default H2 MK starts to slow down after 5k transient nodes and
>fails after 6k:
>
>$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
>      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
>      WikipediaImport Oak-Default
>Apache Jackrabbit Oak 0.7-SNAPSHOT
>Wikipedia import (fowiki-20130213-pages-articles.xml)
>Oak-Default: importing Wikipedia...
>Imported 1000 pages in 1 seconds (1271us/page)
>Imported 2000 pages in 2 seconds (1465us/page)
>Imported 3000 pages in 4 seconds (1475us/page)
>Imported 4000 pages in 6 seconds (1749us/page)
>Imported 5000 pages in 11 seconds (2219us/page)
>Imported 6000 pages in 28 seconds (4815us/page)
>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
>The new MongoMK prototype fails already sooner:
>
>$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
>      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
>      WikipediaImport Oak-Mongo
>Apache Jackrabbit Oak 0.7-SNAPSHOT
>Wikipedia import (fowiki-20130213-pages-articles.xml)
>Oak-Mongo: importing Wikipedia...
>Imported 1000 pages in 1 seconds (1949us/page)
>Imported 2000 pages in 6 seconds (3260us/page)
>Imported 3000 pages in 13 seconds (4523us/page)
>Imported 4000 pages in 30 seconds (7613us/page)
>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
>After my recent work on OAK-632 the SegmentMK does better, but it also
>experiences some slowdown over time:
>
>$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
>      benchmark --wikipedia=fowiki-20130213-pages-articles.xml \
>      WikipediaImport Oak-Segment
>Apache Jackrabbit Oak 0.7-SNAPSHOT
>Wikipedia import (fowiki-20130213-pages-articles.xml)
>Oak-Segment: importing Wikipedia...
>Imported 1000 pages in 1 seconds (1419us/page)
>Imported 2000 pages in 2 seconds (1447us/page)
>Imported 3000 pages in 4 seconds (1492us/page)
>Imported 4000 pages in 6 seconds (1586us/page)
>Imported 5000 pages in 8 seconds (1697us/page)
>Imported 6000 pages in 10 seconds (1812us/page)
>Imported 7000 pages in 13 seconds (1927us/page)
>Imported 8000 pages in 16 seconds (2042us/page)
>Imported 9000 pages in 19 seconds (2146us/page)
>Imported 10000 pages in 22 seconds (2254us/page)
>Imported 11000 pages in 25 seconds (2355us/page)
>Imported 12000 pages in 29 seconds (2462us/page)
>Imported 12148 pages in 41 seconds (3375us/page)
>
>To summarize, all MKs still need some work on this. Once these initial
>problems are solved, we can try the same benchmark with larger
>Wikipedias.
>
>PS. Note that I'm using the OAK-652 feature flag to speed things up on
>the oak-jcr level.
>
>BR,
>
>Jukka Zitting