You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/11/11 11:18:34 UTC

[jira] [Assigned] (JENA-813) In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used

     [ https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Vesse reassigned JENA-813:
------------------------------

    Assignee: Rob Vesse

> In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used
> -------------------------------------------------------------------------------------------
>
>                 Key: JENA-813
>                 URL: https://issues.apache.org/jira/browse/JENA-813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.12.1
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>         Attachments: SlowDatasetGraph.java, SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which seemed to running exceptionally slow.  Basically the in-memory {{DatasetGraph}} implementation will perform extremely poorly when trying to iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked difference between small numbers of quads in a single graph versus the quads in individual graphs.  In my testing with only a few thousand quads placing them in a single graph versus many individual graphs can see 5/6 orders of magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 1000 quads
> 57 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 90 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
> 123 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.021191 seconds to iterate over 1000 quads in 1 graphs
> 123 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 144 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.716750 seconds to iterate over 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 2000 quads
> 861 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 885 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
> 890 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.005400 seconds to iterate over 2000 quads in 1 graphs
> 890 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 922 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 7.002919 seconds to iterate over 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 3000 quads
> 7925 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
> 7948 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.003924 seconds to iterate over 3000 quads in 1 graphs
> 7948 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7973 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 28.749688 seconds to iterate over 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 4000 quads
> 36723 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36727 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
> 36730 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000773 seconds to iterate over 4000 quads in 1 graphs
> 36730 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36741 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 74.896949 seconds to iterate over 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 5000 quads
> 111638 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111644 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
> 111645 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000989 seconds to iterate over 5000 quads in 1 graphs
> 111645 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111655 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
> 285026 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 173.370614 seconds to iterate over 5000 quads in 5000 graphs
> {noformat}
> From debugging my best guess is this is down to how {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by iterating over the graph names and then iterating over the individual graphs appending the iterators together.  Something in the structuring means accessing and appending all those individual iterators is drastically slow compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to iterate will fail with a {{StackOverflowError}}, the culprit in this case is {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)