You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/11/11 11:01:40 UTC

[jira] [Comment Edited] (JENA-813) In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used

    [ https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206224#comment-14206224 ] 

Andy Seaborne edited comment on JENA-813 at 11/11/14 10:01 AM:
---------------------------------------------------------------

With the new test case, I get

{noformat}
458 [main] INFO dev.SlowDatasetGraph  - Took 0.002670 seconds to iterate over 10000 quads in 1 graphs

482 [main] INFO dev.SlowDatasetGraph  - Took 0.013079 seconds to iterate over 10000 quads in 10000 graphs
{noformat}

with the fix above.  Seems reasonable for the general purpose dataset implementation.

Cleaned full output:
{noformat}
0 [main]   - Testing with 1000 quads
41 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
61 [main]   - Took 0.012906 seconds to iterate over 1000 quads in 1 graphs

67 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
79 [main]   - Took 0.011935 seconds to iterate over 1000 quads in 1000 graphs
79 [main]   - Testing with 2000 quads

84 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
85 [main]   - Took 0.001112 seconds to iterate over 2000 quads in 1 graphs

90 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
100 [main]   - Took 0.009100 seconds to iterate over 2000 quads in 2000 graphs
100 [main]   - Testing with 3000 quads

107 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
109 [main]   - Took 0.001506 seconds to iterate over 3000 quads in 1 graphs

124 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
129 [main]   - Took 0.004989 seconds to iterate over 3000 quads in 3000 graphs
130 [main]   - Testing with 4000 quads

139 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
141 [main]   - Took 0.001675 seconds to iterate over 4000 quads in 1 graphs

149 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
157 [main]   - Took 0.006983 seconds to iterate over 4000 quads in 4000 graphs
157 [main]   - Testing with 5000 quads

170 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
172 [main]   - Took 0.002355 seconds to iterate over 5000 quads in 1 graphs

193 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
202 [main]   - Took 0.008339 seconds to iterate over 5000 quads in 5000 graphs
202 [main]   - Testing with 6000 quads

220 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 1 graphs
232 [main]   - Took 0.002339 seconds to iterate over 6000 quads in 1 graphs

258 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 6000 graphs
303 [main]   - Took 0.044652 seconds to iterate over 6000 quads in 6000 graphs
303 [main]   - Testing with 7000 quads

317 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 1 graphs
326 [main]   - Took 0.008772 seconds to iterate over 7000 quads in 1 graphs

336 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 7000 graphs
364 [main]   - Took 0.027352 seconds to iterate over 7000 quads in 7000 graphs
364 [main]   - Testing with 8000 quads

375 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 1 graphs
377 [main]   - Took 0.002378 seconds to iterate over 8000 quads in 1 graphs

389 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 8000 graphs
400 [main]   - Took 0.010900 seconds to iterate over 8000 quads in 8000 graphs
401 [main]   - Testing with 9000 quads

412 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 9000 quads in 1 graphs
415 [main]   - Took 0.002503 seconds to iterate over 9000 quads in 1 graphs

427 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 9000 quads in 9000 graphs
439 [main]   - Took 0.012150 seconds to iterate over 9000 quads in 9000 graphs
439 [main]   - Testing with 10000 quads

455 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 10000 quads in 1 graphs
458 [main]   - Took 0.002670 seconds to iterate over 10000 quads in 1 graphs

468 [main]   - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 10000 quads in 10000 graphs
482 [main]   - Took 0.013079 seconds to iterate over 10000 quads in 10000 graphs
{noformat}



was (Author: andy.seaborne):
With the new test case, I get

{noformat}
458 [main] INFO dev.SlowDatasetGraph  - Took 0.002670 seconds to iterate over 10000 quads in 1 graphs

482 [main] INFO dev.SlowDatasetGraph  - Took 0.013079 seconds to iterate over 10000 quads in 10000 graphs
{noformat}

with the fix above.  Seems reasonable for the general purpose dataset implementation.

> In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used
> -------------------------------------------------------------------------------------------
>
>                 Key: JENA-813
>                 URL: https://issues.apache.org/jira/browse/JENA-813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.12.1
>            Reporter: Rob Vesse
>         Attachments: SlowDatasetGraph.java, SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which seemed to running exceptionally slow.  Basically the in-memory {{DatasetGraph}} implementation will perform extremely poorly when trying to iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked difference between small numbers of quads in a single graph versus the quads in individual graphs.  In my testing with only a few thousand quads placing them in a single graph versus many individual graphs can see 5/6 orders of magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 1000 quads
> 57 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 90 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
> 123 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.021191 seconds to iterate over 1000 quads in 1 graphs
> 123 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 144 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.716750 seconds to iterate over 1000 quads in 1000 graphs
> 861 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 2000 quads
> 861 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 885 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
> 890 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.005400 seconds to iterate over 2000 quads in 1 graphs
> 890 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 922 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 7.002919 seconds to iterate over 2000 quads in 2000 graphs
> 7925 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 3000 quads
> 7925 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
> 7948 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.003924 seconds to iterate over 3000 quads in 1 graphs
> 7948 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7973 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 28.749688 seconds to iterate over 3000 quads in 3000 graphs
> 36723 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 4000 quads
> 36723 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36727 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
> 36730 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000773 seconds to iterate over 4000 quads in 1 graphs
> 36730 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36741 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 74.896949 seconds to iterate over 4000 quads in 4000 graphs
> 111638 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 5000 quads
> 111638 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111644 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
> 111645 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000989 seconds to iterate over 5000 quads in 1 graphs
> 111645 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 111655 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
> 285026 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 173.370614 seconds to iterate over 5000 quads in 5000 graphs
> {noformat}
> From debugging my best guess is this is down to how {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by iterating over the graph names and then iterating over the individual graphs appending the iterators together.  Something in the structuring means accessing and appending all those individual iterators is drastically slow compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to iterate will fail with a {{StackOverflowError}}, the culprit in this case is {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)