You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2014/11/10 20:55:34 UTC

[jira] [Issue Comment Deleted] (JENA-813) In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used

     [ https://issues.apache.org/jira/browse/JENA-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne updated JENA-813:
-------------------------------
    Comment: was deleted

(was: Try this, switching {{IteratorCons}} to {{IteratorConcat}}

{code:title=DatasetGraphCollection.java|borderStyle=solid}
@Override
    protected Iterator<Quad> findInAnyNamedGraphs(Node s, Node p, Node o)
    {
        Iterator<Node> gnames = listGraphNodes() ;
        IteratorConcat<Quad> iter = new IteratorConcat<>() ;
        
        // Named graphs
        for ( ; gnames.hasNext() ; )  
        {
            Node gn = gnames.next();
            Iterator<Quad> qIter = findInSpecificNamedGraph(gn, s, p, o) ;
            if ( qIter != null )
                // copes with null for iter
                //iter = Iter.append(iter, qIter) ;
                iter.add(qIter) ;
        }
        return iter ;
    }
{code})

> In-memory DatasetGraph is extremely slow to iterate over when lots of named graphs are used
> -------------------------------------------------------------------------------------------
>
>                 Key: JENA-813
>                 URL: https://issues.apache.org/jira/browse/JENA-813
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.12.1
>            Reporter: Rob Vesse
>         Attachments: SlowDatasetGraph.java
>
>
> Discovered this accidentally in the course of debugging a test case which seemed to running exceptionally slow.  Basically the in-memory {{DatasetGraph}} implementation will perform extremely poorly when trying to iterate all quads it contains.
> This problem manifests at relatively small scales and shows a marked difference between small numbers of quads in a single graph versus the quads in individual graphs.  In my testing with only a few thousand quads placing them in a single graph versus many individual graphs can see 5/6 orders of magnitude performance difference.
> I will attach code to reproduce and show my log output from running this on my system (Mac OS X Mavericks - Quad Core - 8GB RAM):
> {noformat}
> 0 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 1000 quads
> 31 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 44 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1 graphs
> 64 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.009692 seconds to iterate over 1000 quads in 1 graphs
> 64 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 85 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 1000 quads in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.693548 seconds to iterate over 1000 quads in 1000 graphs
> 779 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 2000 quads
> 779 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 1 graphs
> 788 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000047 seconds to iterate over 2000 quads in 1 graphs
> 788 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 821 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 2000 quads in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 6.914044 seconds to iterate over 2000 quads in 2000 graphs
> 7735 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 3000 quads
> 7735 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 1 graphs
> 7743 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000032 seconds to iterate over 3000 quads in 1 graphs
> 7743 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 7767 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 3000 quads in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 28.465885 seconds to iterate over 3000 quads in 3000 graphs
> 36233 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 4000 quads
> 36233 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36234 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 1 graphs
> 36235 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000033 seconds to iterate over 4000 quads in 1 graphs
> 36235 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 36253 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 4000 quads in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 73.393265 seconds to iterate over 4000 quads in 4000 graphs
> 109646 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 5000 quads
> 109646 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 1 graphs
> 109648 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000066 seconds to iterate over 5000 quads in 1 graphs
> 109648 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 109654 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 140.212082 seconds to iterate over 5000 quads in 5000 graphs
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 6000 quads
> 249867 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 249867 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 1 graphs
> 249868 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000049 seconds to iterate over 6000 quads in 1 graphs
> 249868 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 249870 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 276.071471 seconds to iterate over 6000 quads in 6000 graphs
> 525942 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 7000 quads
> 525943 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 1 graphs
> 525944 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000023 seconds to iterate over 7000 quads in 1 graphs
> 525945 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 525950 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 428.267270 seconds to iterate over 7000 quads in 7000 graphs
> 954218 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Testing with 8000 quads
> 954218 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 1 graphs
> 954219 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 0.000010 seconds to iterate over 8000 quads in 1 graphs
> 954219 [main] DEBUG com.hp.hpl.jena.shared.LockMRSW  - Lock : main
> 954222 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - DatasetGraph implementation com.hp.hpl.jena.sparql.core.DatasetGraphMaker has 8000 quads in 8000 graphs
> 243342 [main] INFO org.apache.jena.playground.SlowDatasetGraph  - Took 289.119560 seconds to iterate over 8000 quads in 8000 graphs
> {noformat}
> From debugging my best guess is this is down to how {{DatasetGraphCollection}} (and it's hierarchy) build the quads iterator by iterating over the graph names and then iterating over the individual graphs appending the iterators together.  Something in the structuring means accessing and appending all those individual iterators is drastically slow compared to iterating a single graph.
> A related bug is that given a sufficiently large number of graphs attempts to iterate will fail with a {{StackOverflowError}}, the culprit in this case is {{IteratorCons}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)