You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Lorenz Bühmann (Jira)" <ji...@apache.org> on 2022/02/20 08:22:00 UTC

[jira] [Commented] (JENA-2288) Counting aggregation inside SERVICE provides wrong result

    [ https://issues.apache.org/jira/browse/JENA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495101#comment-17495101 ] 

Lorenz Bühmann commented on JENA-2288:
--------------------------------------

Some comments:

1)

I don't think it is the same to run

 
{code:sql}
OPTIONAL {
 SERVICE <> {
  }
}
{code}
vs

 
{code:sql}
SERVICE <> {
  OPTIONAL {
  }
}
{code}
The first is a left-join the latter a join in your query.

2) you should start debugging from within Wikidata:

the second query with the value inlined is:

 
{code:sql}
select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
      where {
        OPTIONAL {
          ?museum (wdt:P131)+ <http://www.wikidata.org/entity/Q612> ;
                   wdt:P31/(wdt:P279)* wd:Q33506 .
        }
      } group by ?wikidata_iri {code}
and this returns 204 as result. The reason for this is probably how Blazegraph is handling the property paths and it just produces a lots of duplicates. Not that I understand why you would get 201 instead of 204, sound like magic.

Doing
{code:sql}
(COUNT(DISTINCT ?museum) as ?museum_count_in_city){code}
will solve this issue.

 

Why the first query returns 3 because the value is not getting inline (see your other issue). It just runs
{code:sql}
SELECT  ?wikidata_iri ?museum
  WHERE
    { OPTIONAL
        { ?museum (<http://www.wikidata.org/prop/direct/P131>)+ ?wikidata_iri .
          ?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
        }
    } {code}
on the endpoint, then does a join. So, it returns 213201 museums with its locations, then does a join on your value.

Honestly I don't know why Blazegraph is producing different results for
{code:sql}
SELECT  ?wikidata_iri ?museum
  WHERE
    { OPTIONAL
        { ?museum (<http://www.wikidata.org/prop/direct/P131>)+ ?wikidata_iri .
          ?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
        }
    } {code}
compared to the inlined variant
{code:sql}
SELECT  ?wikidata_iri ?museum
  WHERE
    { OPTIONAL
        { ?museum (<http://www.wikidata.org/prop/direct/P131>)+ <http://www.wikidata.org/entity/Q612> .
          ?museum <http://www.wikidata.org/prop/direct/P31>/(<http://www.wikidata.org/prop/direct/P279>)* <http://www.wikidata.org/entity/Q33506>
        }
    } {code}
but you can verify the difference when running both queries directly it on the Wikidata endpoint.

Fun fact: when you run your whole Q2 on Wikidata it works without using DISTINCT, i.e. when Wikidata does a service request to itselft

 

Long story short:
 - Blazegraph produces different results
 - Jena doesn't inline the data for Q1 but gets all results and does a join
 - Jena does inline the data for Q2 but Blazegraph produces lots of duplicates, COUNT(DISTINCT will help here
 - OPTIONAL inside a SERVICE request is not the same as using a SERVICE request inside an OPTIONAL

> Counting aggregation inside SERVICE provides wrong result
> ---------------------------------------------------------
>
>                 Key: JENA-2288
>                 URL: https://issues.apache.org/jira/browse/JENA-2288
>             Project: Apache Jena
>          Issue Type: Bug
>    Affects Versions: Jena 4.4.0
>            Reporter: Dmitry Zhelobanov
>            Priority: Major
>
> Here is a query which retrieves museums in the specific city:
> {code:java}
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT ?wikidata_iri ?museum
> WHERE {
>   VALUES (?wikidata_iri) { (<http://www.wikidata.org/entity/Q612>) } .
>     
>   SERVICE <https://query.wikidata.org/sparql> {
>     {
>       select ?wikidata_iri ?museum
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       }
>     }
>   }
> } {code}
> This query returns 3 results:
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q2125281>|
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q28736367>|
> |<http://www.wikidata.org/entity/Q612>|<http://www.wikidata.org/entity/Q67737768>|
> And here is a query which is supposed to count the number of the same museums in the same city:
> {code:java}
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> SELECT ?wikidata_iri ?museum_count_in_city
> WHERE {
>   VALUES (?wikidata_iri) { (<http://www.wikidata.org/entity/Q612>) } .
>   
>   SERVICE <https://query.wikidata.org/sparql> {
>     {
>       select ?wikidata_iri (COUNT(?museum) as ?museum_count_in_city)
>       where {
>         OPTIONAL {
>           ?museum (wdt:P131)+ ?wikidata_iri ;
>                    wdt:P31/(wdt:P279)* wd:Q33506 .
>         }
>       } group by ?wikidata_iri
>     }
>   }
> }{code}
> But the count value produced by the query is wrong:
> |<[http://www.wikidata.org/entity/Q612]>|"201"{^}^^<[http://www.w3.org/2001/XMLSchema#integer]>{^}|
> It outputs *201* instead of expected *3.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)