You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Jeffrey Kenneth Tyzzer <jk...@ucdavis.edu.INVALID> on 2021/01/30 00:44:47 UTC

"Multiplicative effect" in certain SPARQL query results

I have a query for academic articles that looks like this:

--Query 1--

CONSTRUCT{
                ?publication a ?citeType ;
                                                cite:title ?publicationTitle ;
                                                cite:issn ?issn ;
                                                cite:eissn ?eissn ;
                                                cite:container-title ?journalTitle ;
                                                cite:author [ cite:rank ?authorRank ; cite:family ?authorLastName ; cite:given ?authorFirstName ]
}
WHERE {
                GRAPH <http://test/pubs/> {
    BIND(<http://test/1483451> AS ?publication)
                                VALUES (?citeType ?citeText ?cslType) {
                                                (bibo:AcademicArticle "journal-article" "article-journal")
                                                (bibo:Book "book" "book")
                                                (vivo:ConferencePaper "conference" "paper-conference")
                                                (bibo:Chapter "chapter" "chapter")
                                }
                                ?publication a ?citeType ;
                                                                                rdfs:label ?publicationTitle ;
                                                                                vivo:relatedBy [ a vivo:Authorship ; vivo:rank ?authorRank; vivo:relates [ a vcard:Individual ; vcard:hasName [ a vcard:Name ; vcard:last_name ?authorLastName ; vcard:first_name ?authorFirstName ] ] ]
                                OPTIONAL {
                                                ?publication vivo:hasPublicationVenue ?journal .
                                                ?journal bibo:eissn ?eissn ;
                                                                                bibo:issn ?issn ;
                                                                                rdfs:label ?journalTitle ;
                                }
                }
}

--End query 1--

What’s important to note is that there are four authors of the article and two ISSNs and titles for its journal (it switched names a few decades ago but the eISSN didn’t change). As you can see, the authors are retrieved and CONSTRUCTed using blank nodes (the data model incidentally is Article <--> Authorship <--> Individual --> Name).

The problem I’m having is that, because of there being two ISSNs and titles, the CONSTRUCT is returning 16 author triples (4 authors x  2 ISSNs x 2 titles ), i.e.:

--Query 1 output--

<http://test/1483451>
        a                     bibo:AcademicArticle ;
        cite:author           [ cite:family  "Rafter" ;
                                cite:given   "J" ;
                                cite:rank    1
                              ] ;
        cite:author           [ cite:family  "Benham" ;
                                cite:given   "K" ;
                                cite:rank    3
                              ] ;
        cite:author           [ cite:family  "Mastro" ;
                                cite:given   "RHR" ;
                                cite:rank    4
                              ] ;
        cite:author           [ cite:family  "Andersen" ;
                                cite:given   "D" ;
                                cite:rank    2
                              ] ;
        cite:author           [ cite:family  "Mastro" ;
                                cite:given   "RHR" ;
                                cite:rank    4
                              ] ;
        cite:author           [ cite:family  "Andersen" ;
                                cite:given   "D" ;
                                cite:rank    2
                              ] ;
        cite:author           [ cite:family  "Andersen" ;
                                cite:given   "D" ;
                                cite:rank    2
                              ] ;
        cite:author           [ cite:family  "Mastro" ;
                                cite:given   "RHR" ;
                                cite:rank    4
                              ] ;
        cite:author           [ cite:family  " Benham " ;
                                cite:given   "K" ;
                                cite:rank    3
                              ] ;
        cite:author           [ cite:family  "Rafter" ;
                                cite:given   "J" ;
                                cite:rank    1
                              ] ;
        cite:author           [ cite:family  " Benham " ;
                                cite:given   "K" ;
                                cite:rank    3
                              ] ;
        cite:author           [ cite:family  "Andersen" ;
                                cite:given   "D" ;
                                cite:rank    2
                              ] ;
        cite:author           [ cite:family  "Mastro" ;
                                cite:given   "RHR" ;
                                cite:rank    4
                              ] ;
        cite:author           [ cite:family  "Rafter" ;
                                cite:given   "J" ;
                                cite:rank    1
                              ] ;
        cite:author           [ cite:family  " Benham " ;
                                cite:given   "K" ;
                                cite:rank    3
                              ] ;
        cite:author           [ cite:family  "Rafter" ;
                                cite:given   "J" ;
                                cite:rank    1
                              ] ;
        cite:container-title  "Journal of the American Ceramic Society" , "Advanced Ceramic Materials" ;
        cite:eissn            "1551-2916" ;
        cite:issn             "0883-5551" , "0002-7820" ;
        cite:title            "Synthesis and sintering behavior of spinel nanoparticles" .

--End query 1 output--

If I comment out the bibo:issn ?issn  and rdfs:label ?journalTitle patterns in the WHERE clause, or if I don’t use the [ cite:rank ?authorRank ; cite:family ?authorLastName ; cite:given ?authorFirstName ] structure in the CONSTRUCT, I get what I expect:

--Query 1a output--

<http://test/1483451>
        a            bibo:AcademicArticle ;
        cite:author  [ cite:family  "Andersen" ;
                       cite:given   "D" ;
                       cite:rank    2
                     ] ;
        cite:author  [ cite:family  " Benham " ;
                       cite:given   "K" ;
                       cite:rank    3
                     ] ;
        cite:author  [ cite:family  "Mastro" ;
                       cite:given   "RHR" ;
                       cite:rank    4
                     ] ;
        cite:author  [ cite:family  "Rafter" ;
                       cite:given   "J" ;
                       cite:rank    1
                     ] ;
        cite:eissn   "1551-2916" ;
        cite:title   "Synthesis and sintering behavior of spinel nanoparticles" .

--End query 1a output—

If I switch the CONSTRUCT to a SELECT I see (and would expect) 16 rows, but was not anticipating the CONSTRUCT to behave like this (i.e., express such a product of the triples). Can one of you kindly explain what’s going on under the covers and if there’s a remedy for this behavior?

Thanks much.

--Jeff

Re: "Multiplicative effect" in certain SPARQL query results

Posted by Andy Seaborne <an...@apache.org>.

On 30/01/2021 00:44, Jeffrey Kenneth Tyzzer wrote:
> I have a query for academic articles that looks like this:
> 
> --Query 1--
> 
> CONSTRUCT{
>                  ?publication a ?citeType ;
>                                                  cite:title ?publicationTitle ;
>                                                  cite:issn ?issn ;
>                                                  cite:eissn ?eissn ;
>                                                  cite:container-title ?journalTitle ;
>                                                  cite:author [ cite:rank ?authorRank ; cite:family ?authorLastName ; cite:given ?authorFirstName ]
> }
> WHERE {
>                  GRAPH <http://test/pubs/> {
>      BIND(<http://test/1483451> AS ?publication)
>                                  VALUES (?citeType ?citeText ?cslType) {
>                                                  (bibo:AcademicArticle "journal-article" "article-journal")
>                                                  (bibo:Book "book" "book")
>                                                  (vivo:ConferencePaper "conference" "paper-conference")
>                                                  (bibo:Chapter "chapter" "chapter")
>                                  }
>                                  ?publication a ?citeType ;
>                                                                                  rdfs:label ?publicationTitle ;
>                                                                                  vivo:relatedBy [ a vivo:Authorship ; vivo:rank ?authorRank; vivo:relates [ a vcard:Individual ; vcard:hasName [ a vcard:Name ; vcard:last_name ?authorLastName ; vcard:first_name ?authorFirstName ] ] ]
>                                  OPTIONAL {
>                                                  ?publication vivo:hasPublicationVenue ?journal .
>                                                  ?journal bibo:eissn ?eissn ;
>                                                                                  bibo:issn ?issn ;
>                                                                                  rdfs:label ?journalTitle ;
>                                  }
>                  }
> }
> 
> --End query 1--
> 
> What’s important to note is that there are four authors of the article and two ISSNs and titles for its journal (it switched names a few decades ago but the eISSN didn’t change). As you can see, the authors are retrieved and CONSTRUCTed using blank nodes (the data model incidentally is Article <--> Authorship <--> Individual --> Name).
> 
> The problem I’m having is that, because of there being two ISSNs and titles, the CONSTRUCT is returning 16 author triples (4 authors x  2 ISSNs x 2 titles ), i.e.:
> 
> --Query 1 output--
> 
> <http://test/1483451>
>          a                     bibo:AcademicArticle ;
>          cite:author           [ cite:family  "Rafter" ;
>                                  cite:given   "J" ;
>                                  cite:rank    1
>                                ] ;
>          cite:author           [ cite:family  "Benham" ;
>                                  cite:given   "K" ;
>                                  cite:rank    3
>                                ] ;
>          cite:author           [ cite:family  "Mastro" ;
>                                  cite:given   "RHR" ;
>                                  cite:rank    4
>                                ] ;
>          cite:author           [ cite:family  "Andersen" ;
>                                  cite:given   "D" ;
>                                  cite:rank    2
>                                ] ;
>          cite:author           [ cite:family  "Mastro" ;
>                                  cite:given   "RHR" ;
>                                  cite:rank    4
>                                ] ;
>          cite:author           [ cite:family  "Andersen" ;
>                                  cite:given   "D" ;
>                                  cite:rank    2
>                                ] ;
>          cite:author           [ cite:family  "Andersen" ;
>                                  cite:given   "D" ;
>                                  cite:rank    2
>                                ] ;
>          cite:author           [ cite:family  "Mastro" ;
>                                  cite:given   "RHR" ;
>                                  cite:rank    4
>                                ] ;
>          cite:author           [ cite:family  " Benham " ;
>                                  cite:given   "K" ;
>                                  cite:rank    3
>                                ] ;
>          cite:author           [ cite:family  "Rafter" ;
>                                  cite:given   "J" ;
>                                  cite:rank    1
>                                ] ;
>          cite:author           [ cite:family  " Benham " ;
>                                  cite:given   "K" ;
>                                  cite:rank    3
>                                ] ;
>          cite:author           [ cite:family  "Andersen" ;
>                                  cite:given   "D" ;
>                                  cite:rank    2
>                                ] ;
>          cite:author           [ cite:family  "Mastro" ;
>                                  cite:given   "RHR" ;
>                                  cite:rank    4
>                                ] ;
>          cite:author           [ cite:family  "Rafter" ;
>                                  cite:given   "J" ;
>                                  cite:rank    1
>                                ] ;
>          cite:author           [ cite:family  " Benham " ;
>                                  cite:given   "K" ;
>                                  cite:rank    3
>                                ] ;
>          cite:author           [ cite:family  "Rafter" ;
>                                  cite:given   "J" ;
>                                  cite:rank    1
>                                ] ;
>          cite:container-title  "Journal of the American Ceramic Society" , "Advanced Ceramic Materials" ;
>          cite:eissn            "1551-2916" ;
>          cite:issn             "0883-5551" , "0002-7820" ;
>          cite:title            "Synthesis and sintering behavior of spinel nanoparticles" .
> 
> --End query 1 output--
> 
> If I comment out the bibo:issn ?issn  and rdfs:label ?journalTitle patterns in the WHERE clause, or if I don’t use the [ cite:rank ?authorRank ; cite:family ?authorLastName ; cite:given ?authorFirstName ] structure in the CONSTRUCT, I get what I expect:
> 
> --Query 1a output--
> 
> <http://test/1483451>
>          a            bibo:AcademicArticle ;
>          cite:author  [ cite:family  "Andersen" ;
>                         cite:given   "D" ;
>                         cite:rank    2
>                       ] ;
>          cite:author  [ cite:family  " Benham " ;
>                         cite:given   "K" ;
>                         cite:rank    3
>                       ] ;
>          cite:author  [ cite:family  "Mastro" ;
>                         cite:given   "RHR" ;
>                         cite:rank    4
>                       ] ;
>          cite:author  [ cite:family  "Rafter" ;
>                         cite:given   "J" ;
>                         cite:rank    1
>                       ] ;
>          cite:eissn   "1551-2916" ;
>          cite:title   "Synthesis and sintering behavior of spinel nanoparticles" .
> 
> --End query 1a output—
> 
> If I switch the CONSTRUCT to a SELECT I see (and would expect) 16 rows, but was not anticipating the CONSTRUCT to behave like this (i.e., express such a product of the triples). Can one of you kindly explain what’s going on under the covers and if there’s a remedy for this behavior?
> 

CONSTRUCT is

+ execute WHERE as a SELECT *
+ result model = empty graph
+ feed the SELECT rows one at a time into the template to produce RDF
     Your query has ?citeText in these rows.
+ add each template instantiation to the result model
+ return result model

and you have a blank node in the template.

Each time the template is used, you get a fresh blank node.
16 rows, 16 blank nodes, 16 unique "cite:author [ ... ]" property-values.

If the WHERE is

WHERE {
    SELECT DISTINCT <only variables used in the template>
    WHERE {
...
    }
}

and specifically not ?citeText, you will get less duplication.

     Andy

PS Could you make the query more readable and also include the prefixes 
so the reader can read it in the email or take it and parse it locally. 
Thanks.




> Thanks much.
> 
> --Jeff
>