You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2018/06/01 09:23:00 UTC

[jira] [Created] (JENA-1552) Bulk loader for TDB2 (phased loading)

Andy Seaborne created JENA-1552:
-----------------------------------

             Summary: Bulk loader for TDB2 (phased loading)
                 Key: JENA-1552
                 URL: https://issues.apache.org/jira/browse/JENA-1552
             Project: Apache Jena
          Issue Type: Improvement
          Components: TDB2
            Reporter: Andy Seaborne
            Assignee: Andy Seaborne


Following on from JENA-1550, this ticket is for phased loading which combined features of the sequential loader and the parallel loader.

When building all the persistent datastructures (parallel loader), the work on different indexes at the same time is competing for hardware resources, RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable slowdown.

The sequential loader is the other extreme of the design spectrum. It does work on one index at a time so as to maximize caching efficiency.

Phased loading has parallel operation per phase and splits work into subsets of indexes.

At 200m and loading to rotational disk, an experimental phased loader working with 2 indexes at a time, starts to become faster than parallel on the same hardware as used for the [figures in JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] (56K parallel, 76K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)