You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Thomas Francart <th...@sparna.fr> on 2022/04/27 21:15:11 UTC

Re: SHACL-based data extraction from a knowledge graph

Just to let you know - I've given up on this, and instead am automating the
generation of a set of CONSTRUCT SPARQL queries built based on the parsing
of (a very limited subset of) SHACL constraints (following
sh:property/sh:path+sh:node recursively, using sh:in sh:hasValue,
sh:languageIn to filter values to pull from the graph)
It doesn't guarantee 100% conformity with the SHACL definition, especially
wrt to cardinalities - but if the underlying graph doesn't match the
specified shapes, then it is not the objective of this extraction step to
make it conformant.

I will probably release some code when ready.

Thomas


Le lun. 28 mars 2022 à 21:26, Andy Seaborne <an...@apache.org> a écrit :

> Some inspiration from ShEx may help. The "validation" process is defined
> by assigning triples to non-overlapping partitions defined by
> constraints. There can be more then one way to partition the triples in
> a disjunction or conjunction when there are multiple occurrences of
> triples matching multiple constraints in the conjunction. OR and AND in
> ShExC.
>
> The process can involve backtracking to search through alternatives
> (it's like string regex except "bag regex" is assigning triples to bags
> as the regex passes over). It's also more "closed" by default in style.
>
> SHACL does not have this "use once". The sub-shapes of a shape are more
> independent. But it's only the compositional operations that matter =
> not the basic triple constraints. And it matters less if what is being
> extracted is a graph because it's a set.
>
> Some restrictions are necessary - SHACL-SPARQL does say why a constraint
> matched and can need the rest of the graph.
>
>      Andy
>
> On 10/03/2022 19:19, Florian Kleedorfer wrote:
> > Not sure how that could work. You could keep a set of tiples per focus
> > node validation, add all triples that pass the constraint tests (given
> > that you somehow are able to reconstruct the triple(s) from the data the
> > Shacl logic is working on (which is not triples but, in many instances,
> > sets of nodes - e.g. the result of G.allSP()), and emit the set once
> > you've established that the complete shape is valid for the focus node.
> > I would be very sceptical of adding such a special-interest aspect to
> > code like SHACL that must be relied on and fast as can be.
> >
> > Having said that, I've wanted to modify the way Jena evaluates SHACL
> > recently - maybe a way to extend it would be useful (allowing
> > inheritance or having some kind of callback or somesuch). However, I
> > found that for my use case, the trick with the graph wrapper that
> > observes which triples are pulled by SHACL works just fine and is very
> > simple to implement (the shacl validation algorithm, if you want to
> > modify it, is not that simple and easy to mess up).
> >
> > Am 2022-03-09 14:26, schrieb Thomas Francart:
> >> What is VLib.validateShape actually returns the focusNode + Path +
> >> valueNodes that conform to each shape ? or emit them through a listener
> ?
> >> (
> >>
> https://github.com/apache/jena/blob/5ce8c141d425655bcaa9d7567117659e502a7ff1/jena-shacl/src/main/java/org/apache/jena/shacl/validation/VLib.java#L89
> >>
> >> )
> >> The idea would be to use the Validator as a "filter" that emits the
> >> triples
> >> valid according to shapes, so that they can be aggregated in an output
> >> graph.
>


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas