You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Zen 98052 <z9...@outlook.com> on 2015/09/22 16:25:10 UTC
question on LIMIT
For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]
When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.
It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?
Thanks,
Z
Re: question on LIMIT
Posted by Martynas Jusevičius <ma...@graphity.org>.
Please post your query first.
On Tue, Sep 22, 2015 at 4:25 PM, Zen 98052 <z9...@outlook.com> wrote:
> For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]
>
> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.
>
> It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?
>
>
> Thanks,
>
> Z
Re: question on LIMIT
Posted by Zen 98052 <z9...@outlook.com>.
2,482 ms vs. 9,119 ms vs. 7,991 ms
________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 3:50 PM
To: users@jena.apache.org
Subject: Re: question on LIMIT
On 22/09/15 19:06, Zen 98052 wrote:
> Thanks Andy! All those 3 queries work as expected. Just an
> information, the first query (putting OFFSET/LIMIT on the top most
> query) is the fastest, second one is query that includes ?name in
> grouping, and the last one is the one using SAMPLE function.
How much difference is there between the different times?
Andy
>
>
> Thanks, Z
Re: question on LIMIT
Posted by Andy Seaborne <an...@apache.org>.
On 22/09/15 19:06, Zen 98052 wrote:
> Thanks Andy! All those 3 queries work as expected. Just an
> information, the first query (putting OFFSET/LIMIT on the top most
> query) is the fastest, second one is query that includes ?name in
> grouping, and the last one is the one using SAMPLE function.
How much difference is there between the different times?
Andy
>
>
> Thanks, Z
Re: question on LIMIT
Posted by Zen 98052 <z9...@outlook.com>.
Thanks Andy! All those 3 queries work as expected.
Just an information, the first query (putting OFFSET/LIMIT on the top most query) is the fastest, second one is query that includes ?name in grouping, and the last one is the one using SAMPLE function.
Thanks,
Z
________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 12:30 PM
To: users@jena.apache.org
Subject: Re: question on LIMIT
On 22/09/15 17:02, Zen 98052 wrote:
> Hi Andy,
> I don't see any of my code referencing to QueryIterSlice or OpSlice.
> To answer question from Martynas, here is the query without offset:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
> { SELECT ?language (COUNT(?x) as ?no_entities)
> WHERE {
> ?x dbpediaont:programmingLanguage ?language .
> }
> GROUP BY ?language
> ORDER BY DESC(?no_entities)
> LIMIT 10
> } .
> ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> And it returns:
>
> "results": {
> "bindings": [
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
> }
> ]
> }
>
>
> If I modify the query to add OFFSET and also change the number on LIMIT, for example:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
> { SELECT ?language (COUNT(?x) as ?no_entities)
> WHERE {
> ?x dbpediaont:programmingLanguage ?language .
> }
> GROUP BY ?language
> ORDER BY DESC(?no_entities)
> OFFSET 4
> LIMIT 3
> } .
> ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> Now I got this result:
>
> "results": {
> "bindings": [
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
> }
> ]
> }
>
>
> Something wrong with the SPARQL query?
yes.
You get 4/3 outof the inner part but then process it further with a
join, some of which may not match.
Try putting the OFFSET/LIMIT on the top most query:
# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
{ SELECT ?language (COUNT(?x) as ?no_entities)
WHERE {
?x dbpediaont:programmingLanguage ?language .
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
} .
?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
----------------
but removing the inner query and using SAMPLE to grab name (is there
only on name per ?language?)
SELECT (SAMPLE(?_name) AS ?name) ?language (COUNT(?x) as ?no_entities)
{
?x dbpediaont:programmingLanguage ?language .
?language foaf:name ?_name
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
or include ?name in grouping:
SELECT ?name ?language (COUNT(?x) as ?no_entities){
{
?x dbpediaont:programmingLanguage ?language .
?language foaf:name ?_name
}
GROUP BY ?language ?name ## Include ?name.
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
>
>
> Thanks,
> Z
>
> ________________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, September 22, 2015 10:50 AM
> To: users@jena.apache.org
> Subject: Re: question on LIMIT
>
> On 22/09/15 15:42, Andy Seaborne wrote:
>> On 22/09/15 15:25, Zen 98052 wrote:
>>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>>> e, f, g, h, i, j ]
>>>
>>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>>> the result.
>>
>> FWIW that is the result from OFFSET 2 and LIMIT 2
> Typo
> OFFSET 3 and LIMIT 2
>
>>
>>>
>>> It should return 3 items instead of 2, where is the Jena code that I
>>> can debug to find out the issue (likely bug in my code)?
>>
>> Anything with the word "slice" in it.
>>
>> QueryIterSlice
>> OpSlice
>>
>> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
>> differently and more efficiently. (O(n) not O(n log n))
>>
>> QueryIterTopN
>> OpTopN
>>
>>>
>>>
>>> Thanks,
>>>
>>> Z
>>>
>>
>
Re: question on LIMIT
Posted by Andy Seaborne <an...@apache.org>.
On 22/09/15 17:02, Zen 98052 wrote:
> Hi Andy,
> I don't see any of my code referencing to QueryIterSlice or OpSlice.
> To answer question from Martynas, here is the query without offset:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
> { SELECT ?language (COUNT(?x) as ?no_entities)
> WHERE {
> ?x dbpediaont:programmingLanguage ?language .
> }
> GROUP BY ?language
> ORDER BY DESC(?no_entities)
> LIMIT 10
> } .
> ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> And it returns:
>
> "results": {
> "bindings": [
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
> }
> ]
> }
>
>
> If I modify the query to add OFFSET and also change the number on LIMIT, for example:
>
> # What languages are most popular in Wikipedia
> SELECT ?name ?no_entities {
> { SELECT ?language (COUNT(?x) as ?no_entities)
> WHERE {
> ?x dbpediaont:programmingLanguage ?language .
> }
> GROUP BY ?language
> ORDER BY DESC(?no_entities)
> OFFSET 4
> LIMIT 3
> } .
> ?language foaf:name ?name
> }
> ORDER BY DESC(?no_entities)
>
> Now I got this result:
>
> "results": {
> "bindings": [
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
> } ,
> {
> "name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
> "no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
> }
> ]
> }
>
>
> Something wrong with the SPARQL query?
yes.
You get 4/3 outof the inner part but then process it further with a
join, some of which may not match.
Try putting the OFFSET/LIMIT on the top most query:
# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
{ SELECT ?language (COUNT(?x) as ?no_entities)
WHERE {
?x dbpediaont:programmingLanguage ?language .
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
} .
?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
----------------
but removing the inner query and using SAMPLE to grab name (is there
only on name per ?language?)
SELECT (SAMPLE(?_name) AS ?name) ?language (COUNT(?x) as ?no_entities)
{
?x dbpediaont:programmingLanguage ?language .
?language foaf:name ?_name
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
or include ?name in grouping:
SELECT ?name ?language (COUNT(?x) as ?no_entities){
{
?x dbpediaont:programmingLanguage ?language .
?language foaf:name ?_name
}
GROUP BY ?language ?name ## Include ?name.
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
>
>
> Thanks,
> Z
>
> ________________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Tuesday, September 22, 2015 10:50 AM
> To: users@jena.apache.org
> Subject: Re: question on LIMIT
>
> On 22/09/15 15:42, Andy Seaborne wrote:
>> On 22/09/15 15:25, Zen 98052 wrote:
>>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>>> e, f, g, h, i, j ]
>>>
>>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>>> the result.
>>
>> FWIW that is the result from OFFSET 2 and LIMIT 2
> Typo
> OFFSET 3 and LIMIT 2
>
>>
>>>
>>> It should return 3 items instead of 2, where is the Jena code that I
>>> can debug to find out the issue (likely bug in my code)?
>>
>> Anything with the word "slice" in it.
>>
>> QueryIterSlice
>> OpSlice
>>
>> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
>> differently and more efficiently. (O(n) not O(n log n))
>>
>> QueryIterTopN
>> OpTopN
>>
>>>
>>>
>>> Thanks,
>>>
>>> Z
>>>
>>
>
Re: question on LIMIT
Posted by Zen 98052 <z9...@outlook.com>.
Hi Andy,
I don't see any of my code referencing to QueryIterSlice or OpSlice.
To answer question from Martynas, here is the query without offset:
# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
{ SELECT ?language (COUNT(?x) as ?no_entities)
WHERE {
?x dbpediaont:programmingLanguage ?language .
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
LIMIT 10
} .
?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
And it returns:
"results": {
"bindings": [
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "C" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "932" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "C++" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "930" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "Java" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "514" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "Python" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "354" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "Perl" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "99" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "Objective-C" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "75" }
}
]
}
If I modify the query to add OFFSET and also change the number on LIMIT, for example:
# What languages are most popular in Wikipedia
SELECT ?name ?no_entities {
{ SELECT ?language (COUNT(?x) as ?no_entities)
WHERE {
?x dbpediaont:programmingLanguage ?language .
}
GROUP BY ?language
ORDER BY DESC(?no_entities)
OFFSET 4
LIMIT 3
} .
?language foaf:name ?name
}
ORDER BY DESC(?no_entities)
Now I got this result:
"results": {
"bindings": [
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "JavaScript" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "213" }
} ,
{
"name": { "type": "literal" , "xml:lang": "en" , "value": "C#" } ,
"no_entities": { "datatype": "http://www.w3.org/2001/XMLSchema#integer" , "type": "typed-literal" , "value": "122" }
}
]
}
Something wrong with the SPARQL query?
Thanks,
Z
________________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Tuesday, September 22, 2015 10:50 AM
To: users@jena.apache.org
Subject: Re: question on LIMIT
On 22/09/15 15:42, Andy Seaborne wrote:
> On 22/09/15 15:25, Zen 98052 wrote:
>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>> e, f, g, h, i, j ]
>>
>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>> the result.
>
> FWIW that is the result from OFFSET 2 and LIMIT 2
Typo
OFFSET 3 and LIMIT 2
>
>>
>> It should return 3 items instead of 2, where is the Jena code that I
>> can debug to find out the issue (likely bug in my code)?
>
> Anything with the word "slice" in it.
>
> QueryIterSlice
> OpSlice
>
> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
> differently and more efficiently. (O(n) not O(n log n))
>
> QueryIterTopN
> OpTopN
>
>>
>>
>> Thanks,
>>
>> Z
>>
>
Re: question on LIMIT
Posted by Andy Seaborne <an...@apache.org>.
On 22/09/15 15:42, Andy Seaborne wrote:
> On 22/09/15 15:25, Zen 98052 wrote:
>> For example, the query result set (with ORDER clause) is [ a, b, c, d,
>> e, f, g, h, i, j ]
>>
>> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as
>> the result.
>
> FWIW that is the result from OFFSET 2 and LIMIT 2
Typo
OFFSET 3 and LIMIT 2
>
>>
>> It should return 3 items instead of 2, where is the Jena code that I
>> can debug to find out the issue (likely bug in my code)?
>
> Anything with the word "slice" in it.
>
> QueryIterSlice
> OpSlice
>
> And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
> differently and more efficiently. (O(n) not O(n log n))
>
> QueryIterTopN
> OpTopN
>
>>
>>
>> Thanks,
>>
>> Z
>>
>
Re: question on LIMIT
Posted by Andy Seaborne <an...@apache.org>.
On 22/09/15 15:25, Zen 98052 wrote:
> For example, the query result set (with ORDER clause) is [ a, b, c, d, e, f, g, h, i, j ]
>
> When I put OFFSET 4 and LIMIT 3 in the query, I got back [ d, e ] as the result.
FWIW that is the result from OFFSET 2 and LIMIT 2
>
> It should return 3 items instead of 2, where is the Jena code that I can debug to find out the issue (likely bug in my code)?
Anything with the word "slice" in it.
QueryIterSlice
OpSlice
And ARQ does TopN optimization ORDER+OFFSET+LIMIT is executed
differently and more efficiently. (O(n) not O(n log n))
QueryIterTopN
OpTopN
>
>
> Thanks,
>
> Z
>