You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2022/10/18 11:35:08 UTC

SPARQL limit doesn't work

I have a bigger query that starts with inner select

  { SELECT ?s ?score WHERE {
     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .
     } order by desc(?score) offset 0 limit 1000 }

There are about 10000 results. limit 1000 returns ~560 and limit 100 ~75 
results. How do I page results correctly?

Re: SPARQL limit doesn't work

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I had to reset all Jena data since server ran out of memory with drop 
graph. Now with clean data paging works. I'll let you know if problem 
repeats.

On 20/10/2022 9.37, Lorenz Buehmann wrote:
>
> On 19.10.22 13:44, Mikael Pesonen wrote:
>>
>>
>>
>> On 19/10/2022 10.18, Lorenz Buehmann wrote:
>>> Honestly - probably because of lack of knowledge - I don't see how 
>>> that can happen with the text index. You have a single triple 
>>> pattern that is querying the Lucene index for the given pattern and 
>>> returns by default at most 10 000 documents.
>>>
>>>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
>>> translates to
>>>
>>>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
>>> which indeed can return duplicate documents as for each triple a 
>>> separate document is created and indexed.
>>>
>>> I still don't get how a query with limit 1000 returning 560 then 
>>> doesn't return 100 if using limit 100
>>>
>>> Currently, I find your results quite counter intuitive, but I still 
>>> have to learn a log when using RDF, SPARQL and Jena.
>>>
>>>
>>> Can you share some data please to reproduce?
>> Unfortunately I can't share the data. Of course when time, I could 
>> create similar dummy index.
>>>
>>> What happens for a single property only? 
>>
>> What does this mean?
> you're querying two properties aka two fields in the Lucene query. 
> What if you just use skos:prefLabel ?
>>
>>> Pagination should work as you're doing, the Lucene query is 
>>> internally executed once, then cached - for later requests the same 
>>> Lucene documents hits should be reused
>>>
>>> On 19.10.22 08:21, Mikael Pesonen wrote:
>>>>
>>>> Hi,
>>>>
>>>> yes, same select as only query gets exactly limit amount of triples.
>>>>
>>>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>>>> did you get those results when running only this subquery? Afaik, 
>>>>> the default limit of the Lucene text query is at most 10 000 
>>>>> documents - and I don't think that the outer LIMIT would make it 
>>>>> to the Lucene request
>>>>>
>>>>>
>>>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>>>
>>>>>> I have a bigger query that starts with inner select
>>>>>>
>>>>>>  { SELECT ?s ?score WHERE {
>>>>>>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
>>>>>> yy\"" "lang:en" ) .
>>>>>>     } order by desc(?score) offset 0 limit 1000 }
>>>>>>
>>>>>> There are about 10000 results. limit 1000 returns ~560 and limit 
>>>>>> 100 ~75 results. How do I page results correctly?
>>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Re: SPARQL limit doesn't work

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
On 19.10.22 13:44, Mikael Pesonen wrote:
>
>
>
> On 19/10/2022 10.18, Lorenz Buehmann wrote:
>> Honestly - probably because of lack of knowledge - I don't see how 
>> that can happen with the text index. You have a single triple pattern 
>> that is querying the Lucene index for the given pattern and returns 
>> by default at most 10 000 documents.
>>
>>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
>> translates to
>>
>>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
>> which indeed can return duplicate documents as for each triple a 
>> separate document is created and indexed.
>>
>> I still don't get how a query with limit 1000 returning 560 then 
>> doesn't return 100 if using limit 100
>>
>> Currently, I find your results quite counter intuitive, but I still 
>> have to learn a log when using RDF, SPARQL and Jena.
>>
>>
>> Can you share some data please to reproduce?
> Unfortunately I can't share the data. Of course when time, I could 
> create similar dummy index.
>>
>> What happens for a single property only? 
>
> What does this mean?
you're querying two properties aka two fields in the Lucene query. What 
if you just use skos:prefLabel ?
>
>> Pagination should work as you're doing, the Lucene query is 
>> internally executed once, then cached - for later requests the same 
>> Lucene documents hits should be reused
>>
>> On 19.10.22 08:21, Mikael Pesonen wrote:
>>>
>>> Hi,
>>>
>>> yes, same select as only query gets exactly limit amount of triples.
>>>
>>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>>> did you get those results when running only this subquery? Afaik, 
>>>> the default limit of the Lucene text query is at most 10 000 
>>>> documents - and I don't think that the outer LIMIT would make it to 
>>>> the Lucene request
>>>>
>>>>
>>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>>
>>>>> I have a bigger query that starts with inner select
>>>>>
>>>>>  { SELECT ?s ?score WHERE {
>>>>>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
>>>>> yy\"" "lang:en" ) .
>>>>>     } order by desc(?score) offset 0 limit 1000 }
>>>>>
>>>>> There are about 10000 results. limit 1000 returns ~560 and limit 
>>>>> 100 ~75 results. How do I page results correctly?
>>>
>

Re: SPARQL limit doesn't work

Posted by Mikael Pesonen <mi...@lingsoft.fi>.


On 19/10/2022 10.18, Lorenz Buehmann wrote:
> Honestly - probably because of lack of knowledge - I don't see how 
> that can happen with the text index. You have a single triple pattern 
> that is querying the Lucene index for the given pattern and returns by 
> default at most 10 000 documents.
>
>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
> translates to
>
>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
> which indeed can return duplicate documents as for each triple a 
> separate document is created and indexed.
>
> I still don't get how a query with limit 1000 returning 560 then 
> doesn't return 100 if using limit 100
>
> Currently, I find your results quite counter intuitive, but I still 
> have to learn a log when using RDF, SPARQL and Jena.
>
>
> Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could 
create similar dummy index.
>
> What happens for a single property only? 

What does this mean?

> Pagination should work as you're doing, the Lucene query is internally 
> executed once, then cached - for later requests the same Lucene 
> documents hits should be reused
>
> On 19.10.22 08:21, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> yes, same select as only query gets exactly limit amount of triples.
>>
>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>> did you get those results when running only this subquery? Afaik, 
>>> the default limit of the Lucene text query is at most 10 000 
>>> documents - and I don't think that the outer LIMIT would make it to 
>>> the Lucene request
>>>
>>>
>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>
>>>> I have a bigger query that starts with inner select
>>>>
>>>>  { SELECT ?s ?score WHERE {
>>>>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
>>>> yy\"" "lang:en" ) .
>>>>     } order by desc(?score) offset 0 limit 1000 }
>>>>
>>>> There are about 10000 results. limit 1000 returns ~560 and limit 
>>>> 100 ~75 results. How do I page results correctly?
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Re: SPARQL limit doesn't work

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
Honestly - probably because of lack of knowledge - I don't see how that 
can happen with the text index. You have a single triple pattern that is 
querying the Lucene index for the given pattern and returns by default 
at most 10 000 documents.

> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
translates to

> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.

I still don't get how a query with limit 1000 returning 560 then doesn't 
return 100 if using limit 100

Currently, I find your results quite counter intuitive, but I still have 
to learn a log when using RDF, SPARQL and Jena.


Can you share some data please to reproduce?

What happens for a single property only? Pagination should work as 
you're doing, the Lucene query is internally executed once, then cached 
- for later requests the same Lucene documents hits should be reused

On 19.10.22 08:21, Mikael Pesonen wrote:
>
> Hi,
>
> yes, same select as only query gets exactly limit amount of triples.
>
> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>> did you get those results when running only this subquery? Afaik, the 
>> default limit of the Lucene text query is at most 10 000 documents - 
>> and I don't think that the outer LIMIT would make it to the Lucene 
>> request
>>
>>
>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>
>>> I have a bigger query that starts with inner select
>>>
>>>  { SELECT ?s ?score WHERE {
>>>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
>>> "lang:en" ) .
>>>     } order by desc(?score) offset 0 limit 1000 }
>>>
>>> There are about 10000 results. limit 1000 returns ~560 and limit 100 
>>> ~75 results. How do I page results correctly?
>

Re: SPARQL limit doesn't work

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
> did you get those results when running only this subquery? Afaik, the 
> default limit of the Lucene text query is at most 10 000 documents - 
> and I don't think that the outer LIMIT would make it to the Lucene 
> request
>
>
> On 18.10.22 13:35, Mikael Pesonen wrote:
>>
>> I have a bigger query that starts with inner select
>>
>>  { SELECT ?s ?score WHERE {
>>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
>> "lang:en" ) .
>>     } order by desc(?score) offset 0 limit 1000 }
>>
>> There are about 10000 results. limit 1000 returns ~560 and limit 100 
>> ~75 results. How do I page results correctly?

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: SPARQL limit doesn't work

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - and 
I don't think that the outer LIMIT would make it to the Lucene request


On 18.10.22 13:35, Mikael Pesonen wrote:
>
> I have a bigger query that starts with inner select
>
>  { SELECT ?s ?score WHERE {
>     (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
> "lang:en" ) .
>     } order by desc(?score) offset 0 limit 1000 }
>
> There are about 10000 results. limit 1000 returns ~560 and limit 100 
> ~75 results. How do I page results correctly?