You are viewing a plain text version of this content. The canonical link for it is here.

Posted to pr@jena.apache.org by "Aklakan (via GitHub)" <gi...@apache.org> on 2023/01/29 18:07:15 UTC

[GitHub] [jena] Aklakan commented on pull request #1655: Use a skip scan based iterator for listing graph names in TDB2 (GH-1639)

Aklakan commented on PR #1655:
URL: https://github.com/apache/jena/pull/1655#issuecomment-1407731362

   We have the skip scan in use in a graph with around 1 billion triples and graph listings are super fast :+1: 
   
   What would be eventually needed is also make this feature publicly accessible in the various  Tuple/DatasetGraph interfaces.
   My proposal looks like this:
   
   ```java
   // D = domain tuple type (e.g. Quad or Tuple<NodeId), C = component type (e.g. Node or NodeId)
   interface QuadStore<D, C> {
     TupleStreamer<D, C> find(C g, C s, C p, C o, boolean distinct, int ... projectedColumns);
   }
   
   interface TupleStreamer<D, C> {
     Iterator<C> asComponents(); // e.g. Node or NodeIds
     Iterator<D> asDomainTuples(); e.g. Quad
     Iterator<Tuple<C>> asGenericTuples();
   }
   ```
   
   ```sparql
   SELECT DISTINCT ?p { GRAPH ?g { ?s ?p ?o } }
   ```
   
   could then map to a `find()` call such as
   ```
   Iterator<Node> distinctPredicates = datasetGraph.find(ANY, ANY, ANY, ANY, true, 2).asComponents();
   ```
   
   Furthermore, I wonder if the way TDB indexes data would be suitable for a skip scan for the case to retrieve a resource's distinct predicates:
   ```sparql
   SELECT DISTINCT ?p { GRAPH ?g { <concreteS> ?p ?o } }
   ```
   The background is, that we have resources with 4mio+ statements (yeah not that usual) where a scan for distinct predicates takes seconds - maybe with the skip scan it would also be possible to speed this case up?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org