You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by William Markmann <bi...@counterpointconsulting.com> on 2018/03/16 15:35:06 UTC

Decreased performance with many inserts

Has anyone experienced a significant slowdown when adding many (tens /
hundreds of thousands) of documents to an Oak repository?

I'm using:


    <tika.version>1.7</tika.version>
    <jackrabbit.version>2.14.1</jackrabbit.version>
    <oak.version>1.8-SNAPSHOT</oak.version>
    <lucene.version>4.7.1</lucene.version>

and creating the repository (Spring Boot app) basically like:

     MBeanExporter mbe = new MBeanExporter();
    mbe.setServer(mbs);
    mbe.setNamingStrategy(new IdentityNamingStrategy());

    GCMonitor gcMonitor = new GCMonitorTracker();
    StatisticsProvider statisticsProvider = new
MetricStatisticsProvider(mbs, Oak.defaultScheduledExecutor());

    FileStoreBuilder fsBuilder = FileStoreBuilder.fileStoreBuilder(new
File(repoDirectory));
    fsBuilder.withGCMonitor(gcMonitor);
    fsBuilder.withIOMonitor(new MetricsIOMonitor(statisticsProvider));
    fsBuilder.withStatisticsProvider(statisticsProvider);

        this.fs = fsBuilder.build();

        SegmentNodeStoreBuilder nsBuilder =
SegmentNodeStoreBuilders.builder(fs);
        nsBuilder.withStatisticsProvider(statisticsProvider);
        this.ns = nsBuilder.build();
        this.executor = Oak.defaultExecutorService();
        this.oak = new Oak(ns);
        this.oak.with(mbs);
        this.oak.withAsyncIndexing("async", 5);
        this.jcr = new Jcr(oak);
        this.repository = jcr.createRepository();


The basic problem is that I'm doing a data migration (~1 million docs) from
a legacy system.  When I start inserting the documents into Oak (the folder
structure is very flat), it absolutely flies in the beginning, but
significantly slows down by the time I get to 75k or so documents (watching
the stats in
"org.apache.jackrabbit.oak:name=oak.segment.segment-write-time,type=Metrics
/ OneMinutRate" shows an 80% slowdown over the course of an hour or so.

Also noticed / possibly related note -- when I just start out, I see the
"async" indexer running and logging as the folders and documents are being
created, but it stops logging anything within the first 20k inserts.

Should probably also add that the sessions doing the writes are probably
adding about 10 file nodes before syncing (ie. not using a session per file
or doing it all in one session).  The actual inserts are being done using a
thread pool with about six workers simultaneously writing the files into
Oak.

Has anyone else seen similar behavior?  Is there anything I should be
taking into account when moving so many files at once?

Any thoughts would be hugely appreciated.  Thanks!


-- 
*Bill Markmann*
*President | 866 809 0394 x 701*
*Counterpoint Consulting*
*Automate. Innovate. Accelerate.*
c20g.com | *Blog <http://www.c20g.com/site/blog> **| Linkedin
<http://www.linkedin.com/company/counterpoint-consulting-inc.>** | Twitter
<https://twitter.com/c20g>*