You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2022/10/18 11:35:08 UTC
SPARQL limit doesn't work
I have a bigger query that starts with inner select
{ SELECT ?s ?score WHERE {
(?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\""
"lang:en" ) .
} order by desc(?score) offset 0 limit 1000 }
There are about 10000 results. limit 1000 returns ~560 and limit 100 ~75
results. How do I page results correctly?
Re: SPARQL limit doesn't work
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I had to reset all Jena data since server ran out of memory with drop
graph. Now with clean data paging works. I'll let you know if problem
repeats.
On 20/10/2022 9.37, Lorenz Buehmann wrote:
>
> On 19.10.22 13:44, Mikael Pesonen wrote:
>>
>>
>>
>> On 19/10/2022 10.18, Lorenz Buehmann wrote:
>>> Honestly - probably because of lack of knowledge - I don't see how
>>> that can happen with the text index. You have a single triple
>>> pattern that is querying the Lucene index for the given pattern and
>>> returns by default at most 10 000 documents.
>>>
>>>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
>>> translates to
>>>
>>>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
>>> which indeed can return duplicate documents as for each triple a
>>> separate document is created and indexed.
>>>
>>> I still don't get how a query with limit 1000 returning 560 then
>>> doesn't return 100 if using limit 100
>>>
>>> Currently, I find your results quite counter intuitive, but I still
>>> have to learn a log when using RDF, SPARQL and Jena.
>>>
>>>
>>> Can you share some data please to reproduce?
>> Unfortunately I can't share the data. Of course when time, I could
>> create similar dummy index.
>>>
>>> What happens for a single property only?
>>
>> What does this mean?
> you're querying two properties aka two fields in the Lucene query.
> What if you just use skos:prefLabel ?
>>
>>> Pagination should work as you're doing, the Lucene query is
>>> internally executed once, then cached - for later requests the same
>>> Lucene documents hits should be reused
>>>
>>> On 19.10.22 08:21, Mikael Pesonen wrote:
>>>>
>>>> Hi,
>>>>
>>>> yes, same select as only query gets exactly limit amount of triples.
>>>>
>>>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>>>> did you get those results when running only this subquery? Afaik,
>>>>> the default limit of the Lucene text query is at most 10 000
>>>>> documents - and I don't think that the outer LIMIT would make it
>>>>> to the Lucene request
>>>>>
>>>>>
>>>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>>>
>>>>>> I have a bigger query that starts with inner select
>>>>>>
>>>>>> { SELECT ?s ?score WHERE {
>>>>>> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx
>>>>>> yy\"" "lang:en" ) .
>>>>>> } order by desc(?score) offset 0 limit 1000 }
>>>>>>
>>>>>> There are about 10000 results. limit 1000 returns ~560 and limit
>>>>>> 100 ~75 results. How do I page results correctly?
>>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Re: SPARQL limit doesn't work
Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
On 19.10.22 13:44, Mikael Pesonen wrote:
>
>
>
> On 19/10/2022 10.18, Lorenz Buehmann wrote:
>> Honestly - probably because of lack of knowledge - I don't see how
>> that can happen with the text index. You have a single triple pattern
>> that is querying the Lucene index for the given pattern and returns
>> by default at most 10 000 documents.
>>
>>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
>> translates to
>>
>>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
>> which indeed can return duplicate documents as for each triple a
>> separate document is created and indexed.
>>
>> I still don't get how a query with limit 1000 returning 560 then
>> doesn't return 100 if using limit 100
>>
>> Currently, I find your results quite counter intuitive, but I still
>> have to learn a log when using RDF, SPARQL and Jena.
>>
>>
>> Can you share some data please to reproduce?
> Unfortunately I can't share the data. Of course when time, I could
> create similar dummy index.
>>
>> What happens for a single property only?
>
> What does this mean?
you're querying two properties aka two fields in the Lucene query. What
if you just use skos:prefLabel ?
>
>> Pagination should work as you're doing, the Lucene query is
>> internally executed once, then cached - for later requests the same
>> Lucene documents hits should be reused
>>
>> On 19.10.22 08:21, Mikael Pesonen wrote:
>>>
>>> Hi,
>>>
>>> yes, same select as only query gets exactly limit amount of triples.
>>>
>>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>>> did you get those results when running only this subquery? Afaik,
>>>> the default limit of the Lucene text query is at most 10 000
>>>> documents - and I don't think that the outer LIMIT would make it to
>>>> the Lucene request
>>>>
>>>>
>>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>>
>>>>> I have a bigger query that starts with inner select
>>>>>
>>>>> { SELECT ?s ?score WHERE {
>>>>> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx
>>>>> yy\"" "lang:en" ) .
>>>>> } order by desc(?score) offset 0 limit 1000 }
>>>>>
>>>>> There are about 10000 results. limit 1000 returns ~560 and limit
>>>>> 100 ~75 results. How do I page results correctly?
>>>
>
Re: SPARQL limit doesn't work
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
On 19/10/2022 10.18, Lorenz Buehmann wrote:
> Honestly - probably because of lack of knowledge - I don't see how
> that can happen with the text index. You have a single triple pattern
> that is querying the Lucene index for the given pattern and returns by
> default at most 10 000 documents.
>
>> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
> translates to
>
>> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
> which indeed can return duplicate documents as for each triple a
> separate document is created and indexed.
>
> I still don't get how a query with limit 1000 returning 560 then
> doesn't return 100 if using limit 100
>
> Currently, I find your results quite counter intuitive, but I still
> have to learn a log when using RDF, SPARQL and Jena.
>
>
> Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could
create similar dummy index.
>
> What happens for a single property only?
What does this mean?
> Pagination should work as you're doing, the Lucene query is internally
> executed once, then cached - for later requests the same Lucene
> documents hits should be reused
>
> On 19.10.22 08:21, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> yes, same select as only query gets exactly limit amount of triples.
>>
>> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>>> did you get those results when running only this subquery? Afaik,
>>> the default limit of the Lucene text query is at most 10 000
>>> documents - and I don't think that the outer LIMIT would make it to
>>> the Lucene request
>>>
>>>
>>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>>
>>>> I have a bigger query that starts with inner select
>>>>
>>>> { SELECT ?s ?score WHERE {
>>>> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx
>>>> yy\"" "lang:en" ) .
>>>> } order by desc(?score) offset 0 limit 1000 }
>>>>
>>>> There are about 10000 results. limit 1000 returns ~560 and limit
>>>> 100 ~75 results. How do I page results correctly?
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Re: SPARQL limit doesn't work
Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
Honestly - probably because of lack of knowledge - I don't see how that
can happen with the text index. You have a single triple pattern that is
querying the Lucene index for the given pattern and returns by default
at most 10 000 documents.
> text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )
translates to
> ( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a
separate document is created and indexed.
I still don't get how a query with limit 1000 returning 560 then doesn't
return 100 if using limit 100
Currently, I find your results quite counter intuitive, but I still have
to learn a log when using RDF, SPARQL and Jena.
Can you share some data please to reproduce?
What happens for a single property only? Pagination should work as
you're doing, the Lucene query is internally executed once, then cached
- for later requests the same Lucene documents hits should be reused
On 19.10.22 08:21, Mikael Pesonen wrote:
>
> Hi,
>
> yes, same select as only query gets exactly limit amount of triples.
>
> On 18/10/2022 16.48, Lorenz Buehmann wrote:
>> did you get those results when running only this subquery? Afaik, the
>> default limit of the Lucene text query is at most 10 000 documents -
>> and I don't think that the outer LIMIT would make it to the Lucene
>> request
>>
>>
>> On 18.10.22 13:35, Mikael Pesonen wrote:
>>>
>>> I have a bigger query that starts with inner select
>>>
>>> { SELECT ?s ?score WHERE {
>>> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\""
>>> "lang:en" ) .
>>> } order by desc(?score) offset 0 limit 1000 }
>>>
>>> There are about 10000 results. limit 1000 returns ~560 and limit 100
>>> ~75 results. How do I page results correctly?
>
Re: SPARQL limit doesn't work
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi,
yes, same select as only query gets exactly limit amount of triples.
On 18/10/2022 16.48, Lorenz Buehmann wrote:
> did you get those results when running only this subquery? Afaik, the
> default limit of the Lucene text query is at most 10 000 documents -
> and I don't think that the outer LIMIT would make it to the Lucene
> request
>
>
> On 18.10.22 13:35, Mikael Pesonen wrote:
>>
>> I have a bigger query that starts with inner select
>>
>> { SELECT ?s ?score WHERE {
>> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\""
>> "lang:en" ) .
>> } order by desc(?score) offset 0 limit 1000 }
>>
>> There are about 10000 results. limit 1000 returns ~560 and limit 100
>> ~75 results. How do I page results correctly?
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: SPARQL limit doesn't work
Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.
did you get those results when running only this subquery? Afaik, the
default limit of the Lucene text query is at most 10 000 documents - and
I don't think that the outer LIMIT would make it to the Lucene request
On 18.10.22 13:35, Mikael Pesonen wrote:
>
> I have a bigger query that starts with inner select
>
> { SELECT ?s ?score WHERE {
> (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\""
> "lang:en" ) .
> } order by desc(?score) offset 0 limit 1000 }
>
> There are about 10000 results. limit 1000 returns ~560 and limit 100
> ~75 results. How do I page results correctly?