You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/11/10 16:52:34 UTC

[jira] [Created] (JENA-813) In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used

Rob Vesse created JENA-813:
------------------------------

             Summary: In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used
                 Key: JENA-813
                 URL: https://issues.apache.org/jira/browse/JENA-813
             Project: Apache Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: Jena 2.12.1
            Reporter: Rob Vesse


Discovered this accidentally in the course of debugging a test case which seemed to running exceptionally slow.  Basically the in-memory {{DatasetGraph}} implementation will perform extremely poorly when trying to iterate all quads it contains.

This problem manifests at relatively small scales and shows a marked difference between small numbers of quads in a single graph versus the quads in individual graphs.  In my testing with only a few thousand quads placing them in a single graph versus many individual graphs can see 5/6 orders of magnitude performance difference.

I will attach code to reproduce and show my log output from running this on my system (Mac OS X Mavericks - Quad Core - 8GB RAM):

{noformat}
0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 1000 quads
31 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
44 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
64 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.009692 seconds to iterate over 1000 quads in 1 graphs
64 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
85 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.693548 seconds to iterate over 1000 quads in 1000 graphs
779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 2000 quads
779 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000047 seconds to iterate over 2000 quads in 1 graphs
788 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
821 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 6.914044 seconds to iterate over 2000 quads in 2000 graphs
7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 3000 quads
7735 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000032 seconds to iterate over 3000 quads in 1 graphs
7743 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
7767 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 28.465885 seconds to iterate over 3000 quads in 3000 graphs
36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 4000 quads
36233 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
36234 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
36235 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000033 seconds to iterate over 4000 quads in 1 graphs
36235 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
36253 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 73.393265 seconds to iterate over 4000 quads in 4000 graphs
109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 5000 quads
109646 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000066 seconds to iterate over 5000 quads in 1 graphs
109648 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
109654 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 140.212082 seconds to iterate over 5000 quads in 5000 graphs
{noformat}

>From debugging my best guess is this is down to how {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by iterating over the graph names and then iterating over the individual graphs appending the iterators together.  Something in the structuring means accessing and appending all those individual iterators is drastically slow compared to iterating a single graph.

A related bug is that given a sufficiently large number of graphs attempts to iterate will fail with a {{StackOverflowError}}, the culprit in this case is {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)