You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Zen 98052 <z9...@outlook.com> on 2015/09/22 16:25:10 UTC

question on LIMIT

For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]

When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.

It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?


Thanks,

Z

Re: question on LIMIT

Posted by Martynas Jusevičius <ma...@graphity.org>.

Please post your query first.

On Tue, Sep 22, 2015 at 4:25 PM, Zen 98052 <z9...@outlook.com> wrote:
> For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]
>
> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.
>
> It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?
>
>
> Thanks,
>
> Z

Re: question on LIMIT

Posted by Zen 98052 <z9...@outlook.com>.

2,482 ms vs. 9,119 ms vs. 7,991 ms

________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 3:50 PM
To: users@jena.apache.org
Subject: Re: question on LIMIT

On 22/09/15 19:06, Zen 98052 wrote:
> Thanks Andy! All those 3 queries work as expected. Just an
> information, the first query (putting OFFSET/LIMIT on the top most
> query) is the fastest, second one is query that includes ?name in
> grouping, and the last one is the one using SAMPLE function.

How much difference is there between the different times?

        Andy


>
>
> Thanks, Z

Re: question on LIMIT

Posted by Andy Seaborne <an...@apache.org>.

On 22/09/15 19:06, Zen 98052 wrote:
> Thanks Andy! All those 3 queries work as expected. Just an
> information, the first query (putting OFFSET/LIMIT on the top most
> query) is the fastest, second one is query that includes ?name in
> grouping, and the last one is the one using SAMPLE function.

How much difference is there between the different times?

	Andy


>
>
> Thanks, Z

Re: question on LIMIT

Posted by Zen 98052 <z9...@outlook.com>.

Thanks Andy! All those 3 queries work as expected.
Just an information, the first query (putting OFFSET/LIMIT on the top most query) is the fastest, second one is query that includes ?name in grouping, and the last one is the one using SAMPLE function.


Thanks,
Z
________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 12:30 PM
To: users@jena.apache.org
Subject: Re: question on LIMIT

On 22/09/15 17:02, Zen 98052 wrote:
> Hi Andy,
> I don't see any of my code referencing to QueryIterSlice or OpSlice.
> To answer question from Martynas, here is the query without offset:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
>      { SELECT ?language (COUNT(?x) as ?no_entities)
>      WHERE {
>          ?x dbpediaont:programmingLanguage ?language .
>      }
>      GROUP BY ?language
>      ORDER BY DESC(?no_entities)
>      LIMIT 10
>      } .
>      ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> And it returns:
>
>    "results": {
>      "bindings": [
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
>        }
>      ]
>    }
>
>
> If I modify the query to add OFFSET and also change the number on LIMIT, for example:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
>      { SELECT ?language (COUNT(?x) as ?no_entities)
>      WHERE {
>          ?x dbpediaont:programmingLanguage ?language .
>      }
>      GROUP BY ?language
>      ORDER BY DESC(?no_entities)
>      OFFSET 4
>      LIMIT 3
>      } .
>      ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> Now I got this result:
>
>    "results": {
>      "bindings": [
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
>        }
>      ]
>    }
>
>
> Something wrong with the SPARQL query?

yes.

You get 4/3 outof the inner part but then process it further with a
join, some of which may not match.

Try putting the  OFFSET/LIMIT on the top most query:


# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
     { SELECT ?language (COUNT(?x) as ?no_entities)
     WHERE {
         ?x dbpediaont:programmingLanguage ?language .
     }
     GROUP BY ?language
     ORDER BY DESC(?no_entities)
     } .
     ?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
----------------

but removing the inner query and using SAMPLE to grab name (is there
only on name per ?language?)

SELECT (SAMPLE(?_name) AS ?name) ?language (COUNT(?x) as ?no_entities)
{
    ?x dbpediaont:programmingLanguage ?language .
    ?language foaf:name ?_name
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3

or include ?name in grouping:

SELECT ?name ?language (COUNT(?x) as ?no_entities){
{
   ?x dbpediaont:programmingLanguage ?language .
   ?language foaf:name ?_name
}
GROUP BY ?language ?name  ## Include ?name.
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3




>
>
> Thanks,
> Z
>
> ________________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, September 22, 2015 10:50 AM
> To: users@jena.apache.org
> Subject: Re: question on LIMIT
>
> On 22/09/15 15:42, Andy Seaborne wrote:
>> On 22/09/15 15:25, Zen 98052 wrote:
>>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>>> e, f, g, h, i, j ]
>>>
>>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>>> the result.
>>
>> FWIW that is the result from OFFSET 2 and LIMIT 2
> Typo
> OFFSET 3 and LIMIT 2
>
>>
>>>
>>> It should return 3 items instead of 2, where is the Jena code that I
>>> can debug to find out the issue (likely bug in my code)?
>>
>> Anything with the word "slice" in it.
>>
>>     QueryIterSlice
>>     OpSlice
>>
>> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
>> differently and more efficiently.  (O(n) not O(n log n))
>>
>>     QueryIterTopN
>>     OpTopN
>>
>>>
>>>
>>> Thanks,
>>>
>>> Z
>>>
>>
>

Re: question on LIMIT

Posted by Andy Seaborne <an...@apache.org>.

On 22/09/15 17:02, Zen 98052 wrote:
> Hi Andy,
> I don't see any of my code referencing to QueryIterSlice or OpSlice.
> To answer question from Martynas, here is the query without offset:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
>      { SELECT ?language (COUNT(?x) as ?no_entities)
>      WHERE {
>          ?x dbpediaont:programmingLanguage ?language .
>      }
>      GROUP BY ?language
>      ORDER BY DESC(?no_entities)
>      LIMIT 10
>      } .
>      ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> And it returns:
>
>    "results": {
>      "bindings": [
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
>        }
>      ]
>    }
>
>
> If I modify the query to add OFFSET and also change the number on LIMIT, for example:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
>      { SELECT ?language (COUNT(?x) as ?no_entities)
>      WHERE {
>          ?x dbpediaont:programmingLanguage ?language .
>      }
>      GROUP BY ?language
>      ORDER BY DESC(?no_entities)
>      OFFSET 4
>      LIMIT 3
>      } .
>      ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> Now I got this result:
>
>    "results": {
>      "bindings": [
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
>        } ,
>        {
>          "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
>          "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
>        }
>      ]
>    }
>
>
> Something wrong with the SPARQL query?

yes.

You get 4/3 outof the inner part but then process it further with a 
join, some of which may not match.

Try putting the  OFFSET/LIMIT on the top most query:


# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
     { SELECT ?language (COUNT(?x) as ?no_entities)
     WHERE {
         ?x dbpediaont:programmingLanguage ?language .
     }
     GROUP BY ?language
     ORDER BY DESC(?no_entities)
     } .
     ?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
----------------

but removing the inner query and using SAMPLE to grab name (is there 
only on name per ?language?)

SELECT (SAMPLE(?_name) AS ?name) ?language (COUNT(?x) as ?no_entities)
{
    ?x dbpediaont:programmingLanguage ?language .
    ?language foaf:name ?_name
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3

or include ?name in grouping:

SELECT ?name ?language (COUNT(?x) as ?no_entities){
{
   ?x dbpediaont:programmingLanguage ?language .
   ?language foaf:name ?_name
}
GROUP BY ?language ?name  ## Include ?name.
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3




>
>
> Thanks,
> Z
>
> ________________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, September 22, 2015 10:50 AM
> To: users@jena.apache.org
> Subject: Re: question on LIMIT
>
> On 22/09/15 15:42, Andy Seaborne wrote:
>> On 22/09/15 15:25, Zen 98052 wrote:
>>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>>> e, f, g, h, i, j ]
>>>
>>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>>> the result.
>>
>> FWIW that is the result from OFFSET 2 and LIMIT 2
> Typo
> OFFSET 3 and LIMIT 2
>
>>
>>>
>>> It should return 3 items instead of 2, where is the Jena code that I
>>> can debug to find out the issue (likely bug in my code)?
>>
>> Anything with the word "slice" in it.
>>
>>     QueryIterSlice
>>     OpSlice
>>
>> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
>> differently and more efficiently.  (O(n) not O(n log n))
>>
>>     QueryIterTopN
>>     OpTopN
>>
>>>
>>>
>>> Thanks,
>>>
>>> Z
>>>
>>
>

Re: question on LIMIT

Posted by Zen 98052 <z9...@outlook.com>.

Hi Andy,
I don't see any of my code referencing to QueryIterSlice or OpSlice.
To answer question from Martynas, here is the query without offset:

# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
    { SELECT ?language (COUNT(?x) as ?no_entities)
    WHERE {
        ?x dbpediaont:programmingLanguage ?language .
    }
    GROUP BY ?language
    ORDER BY DESC(?no_entities)
    LIMIT 10
    } .
    ?language foaf:name ?name
}
ORDER BY DESC(?no_entities)

And it returns:

  "results": {
    "bindings": [
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
      }
    ]
  }


If I modify the query to add OFFSET and also change the number on LIMIT, for example:

# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
    { SELECT ?language (COUNT(?x) as ?no_entities)
    WHERE {
        ?x dbpediaont:programmingLanguage ?language .
    }
    GROUP BY ?language
    ORDER BY DESC(?no_entities)
    OFFSET 4
    LIMIT 3
    } .
    ?language foaf:name ?name
}
ORDER BY DESC(?no_entities)

Now I got this result:

  "results": {
    "bindings": [
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
      } ,
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
        "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
      }
    ]
  }


Something wrong with the SPARQL query?


Thanks,
Z

________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 10:50 AM
To: users@jena.apache.org
Subject: Re: question on LIMIT

On 22/09/15 15:42, Andy Seaborne wrote:
> On 22/09/15 15:25, Zen 98052 wrote:
>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>> e, f, g, h, i, j ]
>>
>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>> the result.
>
> FWIW that is the result from OFFSET 2 and LIMIT 2
Typo
OFFSET 3 and LIMIT 2

>
>>
>> It should return 3 items instead of 2, where is the Jena code that I
>> can debug to find out the issue (likely bug in my code)?
>
> Anything with the word "slice" in it.
>
>    QueryIterSlice
>    OpSlice
>
> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
> differently and more efficiently.  (O(n) not O(n log n))
>
>    QueryIterTopN
>    OpTopN
>
>>
>>
>> Thanks,
>>
>> Z
>>
>

Re: question on LIMIT

Posted by Andy Seaborne <an...@apache.org>.

On 22/09/15 15:42, Andy Seaborne wrote:
> On 22/09/15 15:25, Zen 98052 wrote:
>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>> e, f, g, h, i, j ]
>>
>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>> the result.
>
> FWIW that is the result from OFFSET 2 and LIMIT 2
Typo
OFFSET 3 and LIMIT 2

>
>>
>> It should return 3 items instead of 2, where is the Jena code that I
>> can debug to find out the issue (likely bug in my code)?
>
> Anything with the word "slice" in it.
>
>    QueryIterSlice
>    OpSlice
>
> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
> differently and more efficiently.  (O(n) not O(n log n))
>
>    QueryIterTopN
>    OpTopN
>
>>
>>
>> Thanks,
>>
>> Z
>>
>

Re: question on LIMIT

Posted by Andy Seaborne <an...@apache.org>.

On 22/09/15 15:25, Zen 98052 wrote:
> For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]
>
> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.

FWIW that is the result from OFFSET 2 and LIMIT 2

>
> It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?

Anything with the word "slice" in it.

   QueryIterSlice
   OpSlice

And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed 
differently and more efficiently.  (O(n) not O(n log n))

   QueryIterTopN
   OpTopN

>
>
> Thanks,
>
> Z
>