You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Lorenz Bühmann (Jira)" <ji...@apache.org> on 2022/02/20 08:22:00 UTC
[jira] [Commented] (JENA-2288) Counting aggregation inside SERVICE provides wrong result
[ https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495101#comment-17495101 ]
Lorenz Bühmann commented on JENA-2288:
--------------------------------------
Some comments:
1)
I don't think it is the same to run
{code:sql}
OPTIONAL {
SERVICE <> {
}
}
{code}
vs
{code:sql}
SERVICE <> {
OPTIONAL {
}
}
{code}
The first is a left-join the latter a join in your query.
2) you should start debugging from within Wikidata:
the second query with the value inlined is:
{code:sql}
select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
where {
OPTIONAL {
?museum (wdt:P131)+ <http://www.wikidata.org/entity/Q612> ;
wdt:P31/(wdt:P279)* wd:Q33506 .
}
} group by ?wikidata_iri {code}
and this returns 204 as result. The reason for this is probably how Blazegraph is handling the property paths and it just produces a lots of duplicates. Not that I understand why you would get 201 instead of 204, sound like magic.
Doing
{code:sql}
(COUNT(DISTINCT ?museum) as ?museum_count_in_city){code}
will solve this issue.
Why the first query returns 3 because the value is not getting inline (see your other issue). It just runs
{code:sql}
SELECT ?wikidata_iri ?museum
WHERE
{ OPTIONAL
{ ?museum (<http://www.wikidata.org/prop/direct/P131>)+ ?wikidata_iri .
?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
}
} {code}
on the endpoint, then does a join. So, it returns 213201 museums with its locations, then does a join on your value.
Honestly I don't know why Blazegraph is producing different results for
{code:sql}
SELECT ?wikidata_iri ?museum
WHERE
{ OPTIONAL
{ ?museum (<http://www.wikidata.org/prop/direct/P131>)+ ?wikidata_iri .
?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
}
} {code}
compared to the inlined variant
{code:sql}
SELECT ?wikidata_iri ?museum
WHERE
{ OPTIONAL
{ ?museum (<http://www.wikidata.org/prop/direct/P131>)+ <http://www.wikidata.org/entity/Q612> .
?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
}
} {code}
but you can verify the difference when running both queries directly it on the Wikidata endpoint.
Fun fact: when you run your whole Q2 on Wikidata it works without using DISTINCT, i.e. when Wikidata does a service request to itselft
Long story short:
- Blazegraph produces different results
- Jena doesn't inline the data for Q1 but gets all results and does a join
- Jena does inline the data for Q2 but Blazegraph produces lots of duplicates, COUNT(DISTINCT will help here
- OPTIONAL inside a SERVICE request is not the same as using a SERVICE request inside an OPTIONAL
> Counting aggregation inside SERVICE provides wrong result
> ---------------------------------------------------------
>
> Key: JENA-2288
> URL: https://issues.apache.org/jira/browse/JENA-2288
> Project: Apache Jena
> Issue Type: Bug
> Affects Versions: Jena 4.4.0
> Reporter: Dmitry Zhelobanov
> Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT ?wikidata_iri ?museum
> WHERE {
> VALUES (?wikidata_iri) { (<http://www.wikidata.org/entity/Q612>) } .
>
> SERVICE <https://query.wikidata.org/sparql> {
> {
> select ?wikidata_iri ?museum
> where {
> OPTIONAL {
> ?museum (wdt:P131)+ ?wikidata_iri ;
> wdt:P31/(wdt:P279)* wd:Q33506 .
> }
> }
> }
> }
> } {code}
> This query returns 3 results:
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q2125281>|
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q28736367>|
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q67737768>|
> And here is a query which is supposed to count the number of the same museums in the same city:
> {code:java}
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
> VALUES (?wikidata_iri) { (<http://www.wikidata.org/entity/Q612>) } .
>
> SERVICE <https://query.wikidata.org/sparql> {
> {
> select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
> where {
> OPTIONAL {
> ?museum (wdt:P131)+ ?wikidata_iri ;
> wdt:P31/(wdt:P279)* wd:Q33506 .
> }
> } group by ?wikidata_iri
> }
> }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*
--
This message was sent by Atlassian Jira
(v8.20.1#820001)