You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Matt Whitby <ma...@gmail.com> on 2021/12/07 19:55:04 UTC

Sparql Query

I have a Sparql question if that's okay.

There are only around 8m triples in our test data, so pretty small.

The query takes a good couple of minutes to run (and sometimes just times
out).

I dare say running an lcase against each field doesn't help matters, but
with no other way of doing a case-insensitive search (well, Regex - but who
likes Regex?) I'm not sure.

Any obvious ways to make it less bad?

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {

?s <http://www.historicengland.org.uk/data/schema/simplename/name> ?name .

OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.

FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Re: Sparql Query

Posted by Pedro <pe...@googlemail.com.INVALID>.

Use the "i" option to make the regex case insensitive

On Tue, 7 Dec 2021, 19:55 Matt Whitby, <ma...@gmail.com> wrote:

> I have a Sparql question if that's okay.
>
> There are only around 8m triples in our test data, so pretty small.
>
> The query takes a good couple of minutes to run (and sometimes just times
> out).
>
> I dare say running an lcase against each field doesn't help matters, but
> with no other way of doing a case-insensitive search (well, Regex - but who
> likes Regex?) I'm not sure.
>
> Any obvious ways to make it less bad?
>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> select ?s ?name
> where {
>
> ?s <http://www.historicengland.org.uk/data/schema/simplename/name> ?name .
>
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
> ?county}.
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
> ?district}.
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
> ?parish}.
>
> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>
> }
> limit 10
>

Re: Sparql Query

Posted by Harri Kiiskinen <ha...@utu.fi>.

I may be wrong, but I think the problem could be with the filters. Now 
you are always filtering on content from the optionals, also in the case 
there is nothing to filter on. The SPARQL recommendation says, that if 
the contents of OPTIONAL do not match, the solution is kept, but no 
bindings are created. I guess this _could_ mean, that in these cases the 
filters are run on unbound variables, which in my experience leads to 
very slow queries.

I would try putting the filters inside the optionals; but the purpose of 
the query would be different, since it would only matter inside the 
optionals. Perhaps a HAVING filter at the end?


Harri Kiiskinen

On 8.12.2021 12.07, Lorenz Buehmann wrote:
> Even if it's not the strings leading to performance issues, using the 
> Jena text index might be definitely more efficient
> 
> On 08.12.21 10:38, Matt Whitby wrote:
>> Fuseki. No inference. TDB2.
>>
>> M
>>
>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
>>
>>> Lots of questions! Details matter!!
>>>
>>> On 08/12/2021 09:05, Matt Whitby wrote:
>>>> It's hosted in a container in Azure.
>>> (Jena storage layer)
>>>
>>> Using TDB1? TDB2?
>>>
>>>> I test it via Postman (though we're writing a RESTFul API to sit on 
>>>> top).
>>> So this is Fuseki? Is there any inference being used?
>>>
>>>       Andy
>>>
>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>>> Hi Matt,
>>>>>
>>>>> That query does not look couple-of-minutes expensive.
>>>>>
>>>>> Could you run it removing parts to see what happens? e.g. Remove one
>>>>> OPTIONAL and it's associated part of the filter.
>>>>>
>>>>> Which storage layer are you using?
>>>>>
>>>>>        Andy
>>>>>
>>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>> wrote:
>>>>>> I dare say running an lcase against each field doesn't help matters,
>>> but
>>>>> with
>>>>>> no other way of doing a case-insensitive search (well, Regex - but 
>>>>>> who
>>>>> likes
>>>>>> Regex?) I'm not sure.
>>>>>>
>>>>>>
>>>>>> On this point alone, if it does turn out that string processing is 
>>>>>> what
>>>>> is
>>>>>> costing you time, you might adjust your data to include a convenience
>>>>>> property with county, district, and parish in lowercase. Then you 
>>>>>> could
>>>>> do
>>>>>> a more direct (and cheaper) match.
>>>>>>
>>>>>> That having been said, it seems unlikely to me that timed-out queries
>>> are
>>>>>> due to something as cheap as lowercasing. Have you tried peeling off
>>> some
>>>>>> of those OPTIONALs to see how much they cost?
>>>>>>
>>>>>> Adam
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>> wrote:
>>>>>>> I have a Sparql question if that's okay.
>>>>>>>
>>>>>>> There are only around 8m triples in our test data, so pretty small.
>>>>>>>
>>>>>>> The query takes a good couple of minutes to run (and sometimes just
>>>>> times
>>>>>>> out).
>>>>>>>
>>>>>>> I dare say running an lcase against each field doesn't help matters,
>>> but
>>>>>>> with no other way of doing a case-insensitive search (well, Regex -
>>> but
>>>>> who
>>>>>>> likes Regex?) I'm not sure.
>>>>>>>
>>>>>>> Any obvious ways to make it less bad?
>>>>>>>
>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>>>> select ?s ?name
>>>>>>> where {
>>>>>>>
>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
>>>>> ?name .
>>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>>>>>>> ?county}.
>>>>>>> OPTIONAL {?s 
>>>>>>> <http://www.historicengland.org.uk/data/schema/district/
>>>>>>> ?district}.
>>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>>>>>>> ?parish}.
>>>>>>>
>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>>>>
>>>>>>> }
>>>>>>> limit 10
>>>>>>>
>>>>
>>


-- 
Tutkijatohtori / post-doctoral researcher
Viral Culture in the Early Nineteenth-Century Europe (ViCE)
Movie Making Finland: Finnish fiction films as audiovisual big data, 
1907–2017 (MoMaF)
Turun yliopisto / University of Turku

Re: Sparql Query

Posted by Andy Seaborne <an...@apache.org>.


On 08/12/2021 23:22, Jeff Lerman wrote:
> Interesting Lorenz; thanks for that pointer!
> 
> nit: Looks like maybe the compatibility matrix needs to be updated for
> recent (>4.0) versions of Jena?

Fixed, thanks.

> 
> On Wed, Dec 8, 2021 at 3:42 AM Lorenz Buehmann <
> buehmann@informatik.uni-leipzig.de> wrote:
> 
>> It does indeed, you just have to set it up initially, see docs:
>> https://jena.apache.org/documentation/query/text-query.html
>>
>> On 08.12.21 11:47, Matt Whitby wrote:
>>> Jena has a text index?
>>>
>>> On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
>>> buehmann@informatik.uni-leipzig.de> wrote:
>>>
>>>> Even if it's not the strings leading to performance issues, using the
>>>> Jena text index might be definitely more efficient
>>>>
>>>> On 08.12.21 10:38, Matt Whitby wrote:
>>>>> Fuseki. No inference. TDB2.
>>>>>
>>>>> M
>>>>>
>>>>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>>> Lots of questions! Details matter!!
>>>>>>
>>>>>> On 08/12/2021 09:05, Matt Whitby wrote:
>>>>>>> It's hosted in a container in Azure.
>>>>>> (Jena storage layer)
>>>>>>
>>>>>> Using TDB1? TDB2?
>>>>>>
>>>>>>> I test it via Postman (though we're writing a RESTFul API to sit on
>>>> top).
>>>>>> So this is Fuseki? Is there any inference being used?
>>>>>>
>>>>>>         Andy
>>>>>>
>>>>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Matt,
>>>>>>>>
>>>>>>>> That query does not look couple-of-minutes expensive.
>>>>>>>>
>>>>>>>> Could you run it removing parts to see what happens? e.g. Remove one
>>>>>>>> OPTIONAL and it's associated part of the filter.
>>>>>>>>
>>>>>>>> Which storage layer are you using?
>>>>>>>>
>>>>>>>>          Andy
>>>>>>>>
>>>>>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>>>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>>>>> wrote:
>>>>>>>>> I dare say running an lcase against each field doesn't help
>> matters,
>>>>>> but
>>>>>>>> with
>>>>>>>>> no other way of doing a case-insensitive search (well, Regex - but
>>>> who
>>>>>>>> likes
>>>>>>>>> Regex?) I'm not sure.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On this point alone, if it does turn out that string processing is
>>>> what
>>>>>>>> is
>>>>>>>>> costing you time, you might adjust your data to include a
>> convenience
>>>>>>>>> property with county, district, and parish in lowercase. Then you
>>>> could
>>>>>>>> do
>>>>>>>>> a more direct (and cheaper) match.
>>>>>>>>>
>>>>>>>>> That having been said, it seems unlikely to me that timed-out
>> queries
>>>>>> are
>>>>>>>>> due to something as cheap as lowercasing. Have you tried peeling
>> off
>>>>>> some
>>>>>>>>> of those OPTIONALs to see how much they cost?
>>>>>>>>>
>>>>>>>>> Adam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>>>>> wrote:
>>>>>>>>>> I have a Sparql question if that's okay.
>>>>>>>>>>
>>>>>>>>>> There are only around 8m triples in our test data, so pretty
>> small.
>>>>>>>>>>
>>>>>>>>>> The query takes a good couple of minutes to run (and sometimes
>> just
>>>>>>>> times
>>>>>>>>>> out).
>>>>>>>>>>
>>>>>>>>>> I dare say running an lcase against each field doesn't help
>> matters,
>>>>>> but
>>>>>>>>>> with no other way of doing a case-insensitive search (well, Regex
>> -
>>>>>> but
>>>>>>>> who
>>>>>>>>>> likes Regex?) I'm not sure.
>>>>>>>>>>
>>>>>>>>>> Any obvious ways to make it less bad?
>>>>>>>>>>
>>>>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>>>>>>> select ?s ?name
>>>>>>>>>> where {
>>>>>>>>>>
>>>>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name
>>>
>>>>>>>> ?name .
>>>>>>>>>> OPTIONAL {?s <
>> http://www.historicengland.org.uk/data/schema/county>
>>>>>>>>>> ?county}.
>>>>>>>>>> OPTIONAL {?s <
>>>> http://www.historicengland.org.uk/data/schema/district/
>>>>>>>>>> ?district}.
>>>>>>>>>> OPTIONAL {?s <
>> http://www.historicengland.org.uk/data/schema/parish>
>>>>>>>>>> ?parish}.
>>>>>>>>>>
>>>>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>> limit 10
>>>>>>>>>>
>>>
>>
>

Re: Re: Re: Sparql Query

Posted by Jeff Lerman <je...@sironamedical.com>.

Interesting Lorenz; thanks for that pointer!

nit: Looks like maybe the compatibility matrix needs to be updated for
recent (>4.0) versions of Jena?

On Wed, Dec 8, 2021 at 3:42 AM Lorenz Buehmann <
buehmann@informatik.uni-leipzig.de> wrote:

> It does indeed, you just have to set it up initially, see docs:
> https://jena.apache.org/documentation/query/text-query.html
>
> On 08.12.21 11:47, Matt Whitby wrote:
> > Jena has a text index?
> >
> > On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
> > buehmann@informatik.uni-leipzig.de> wrote:
> >
> >> Even if it's not the strings leading to performance issues, using the
> >> Jena text index might be definitely more efficient
> >>
> >> On 08.12.21 10:38, Matt Whitby wrote:
> >>> Fuseki. No inference. TDB2.
> >>>
> >>> M
> >>>
> >>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
> >>>
> >>>> Lots of questions! Details matter!!
> >>>>
> >>>> On 08/12/2021 09:05, Matt Whitby wrote:
> >>>>> It's hosted in a container in Azure.
> >>>> (Jena storage layer)
> >>>>
> >>>> Using TDB1? TDB2?
> >>>>
> >>>>> I test it via Postman (though we're writing a RESTFul API to sit on
> >> top).
> >>>> So this is Fuseki? Is there any inference being used?
> >>>>
> >>>>        Andy
> >>>>
> >>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
> >>>>>
> >>>>>> Hi Matt,
> >>>>>>
> >>>>>> That query does not look couple-of-minutes expensive.
> >>>>>>
> >>>>>> Could you run it removing parts to see what happens? e.g. Remove one
> >>>>>> OPTIONAL and it's associated part of the filter.
> >>>>>>
> >>>>>> Which storage layer are you using?
> >>>>>>
> >>>>>>         Andy
> >>>>>>
> >>>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
> >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> >>>> wrote:
> >>>>>>> I dare say running an lcase against each field doesn't help
> matters,
> >>>> but
> >>>>>> with
> >>>>>>> no other way of doing a case-insensitive search (well, Regex - but
> >> who
> >>>>>> likes
> >>>>>>> Regex?) I'm not sure.
> >>>>>>>
> >>>>>>>
> >>>>>>> On this point alone, if it does turn out that string processing is
> >> what
> >>>>>> is
> >>>>>>> costing you time, you might adjust your data to include a
> convenience
> >>>>>>> property with county, district, and parish in lowercase. Then you
> >> could
> >>>>>> do
> >>>>>>> a more direct (and cheaper) match.
> >>>>>>>
> >>>>>>> That having been said, it seems unlikely to me that timed-out
> queries
> >>>> are
> >>>>>>> due to something as cheap as lowercasing. Have you tried peeling
> off
> >>>> some
> >>>>>>> of those OPTIONALs to see how much they cost?
> >>>>>>>
> >>>>>>> Adam
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> >>>> wrote:
> >>>>>>>> I have a Sparql question if that's okay.
> >>>>>>>>
> >>>>>>>> There are only around 8m triples in our test data, so pretty
> small.
> >>>>>>>>
> >>>>>>>> The query takes a good couple of minutes to run (and sometimes
> just
> >>>>>> times
> >>>>>>>> out).
> >>>>>>>>
> >>>>>>>> I dare say running an lcase against each field doesn't help
> matters,
> >>>> but
> >>>>>>>> with no other way of doing a case-insensitive search (well, Regex
> -
> >>>> but
> >>>>>> who
> >>>>>>>> likes Regex?) I'm not sure.
> >>>>>>>>
> >>>>>>>> Any obvious ways to make it less bad?
> >>>>>>>>
> >>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>>>>>> select ?s ?name
> >>>>>>>> where {
> >>>>>>>>
> >>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name
> >
> >>>>>> ?name .
> >>>>>>>> OPTIONAL {?s <
> http://www.historicengland.org.uk/data/schema/county>
> >>>>>>>> ?county}.
> >>>>>>>> OPTIONAL {?s <
> >> http://www.historicengland.org.uk/data/schema/district/
> >>>>>>>> ?district}.
> >>>>>>>> OPTIONAL {?s <
> http://www.historicengland.org.uk/data/schema/parish>
> >>>>>>>> ?parish}.
> >>>>>>>>
> >>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> >>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>> limit 10
> >>>>>>>>
> >
>

Re: Re: Re: Sparql Query

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.

It does indeed, you just have to set it up initially, see docs: 
https://jena.apache.org/documentation/query/text-query.html

On 08.12.21 11:47, Matt Whitby wrote:
> Jena has a text index?
>
> On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
> buehmann@informatik.uni-leipzig.de> wrote:
>
>> Even if it's not the strings leading to performance issues, using the
>> Jena text index might be definitely more efficient
>>
>> On 08.12.21 10:38, Matt Whitby wrote:
>>> Fuseki. No inference. TDB2.
>>>
>>> M
>>>
>>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> Lots of questions! Details matter!!
>>>>
>>>> On 08/12/2021 09:05, Matt Whitby wrote:
>>>>> It's hosted in a container in Azure.
>>>> (Jena storage layer)
>>>>
>>>> Using TDB1? TDB2?
>>>>
>>>>> I test it via Postman (though we're writing a RESTFul API to sit on
>> top).
>>>> So this is Fuseki? Is there any inference being used?
>>>>
>>>>        Andy
>>>>
>>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>>> Hi Matt,
>>>>>>
>>>>>> That query does not look couple-of-minutes expensive.
>>>>>>
>>>>>> Could you run it removing parts to see what happens? e.g. Remove one
>>>>>> OPTIONAL and it's associated part of the filter.
>>>>>>
>>>>>> Which storage layer are you using?
>>>>>>
>>>>>>         Andy
>>>>>>
>>>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>>> wrote:
>>>>>>> I dare say running an lcase against each field doesn't help matters,
>>>> but
>>>>>> with
>>>>>>> no other way of doing a case-insensitive search (well, Regex - but
>> who
>>>>>> likes
>>>>>>> Regex?) I'm not sure.
>>>>>>>
>>>>>>>
>>>>>>> On this point alone, if it does turn out that string processing is
>> what
>>>>>> is
>>>>>>> costing you time, you might adjust your data to include a convenience
>>>>>>> property with county, district, and parish in lowercase. Then you
>> could
>>>>>> do
>>>>>>> a more direct (and cheaper) match.
>>>>>>>
>>>>>>> That having been said, it seems unlikely to me that timed-out queries
>>>> are
>>>>>>> due to something as cheap as lowercasing. Have you tried peeling off
>>>> some
>>>>>>> of those OPTIONALs to see how much they cost?
>>>>>>>
>>>>>>> Adam
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>>>> wrote:
>>>>>>>> I have a Sparql question if that's okay.
>>>>>>>>
>>>>>>>> There are only around 8m triples in our test data, so pretty small.
>>>>>>>>
>>>>>>>> The query takes a good couple of minutes to run (and sometimes just
>>>>>> times
>>>>>>>> out).
>>>>>>>>
>>>>>>>> I dare say running an lcase against each field doesn't help matters,
>>>> but
>>>>>>>> with no other way of doing a case-insensitive search (well, Regex -
>>>> but
>>>>>> who
>>>>>>>> likes Regex?) I'm not sure.
>>>>>>>>
>>>>>>>> Any obvious ways to make it less bad?
>>>>>>>>
>>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>>>>> select ?s ?name
>>>>>>>> where {
>>>>>>>>
>>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
>>>>>> ?name .
>>>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>>>>>>>> ?county}.
>>>>>>>> OPTIONAL {?s <
>> http://www.historicengland.org.uk/data/schema/district/
>>>>>>>> ?district}.
>>>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>>>>>>>> ?parish}.
>>>>>>>>
>>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>>>>>
>>>>>>>> }
>>>>>>>> limit 10
>>>>>>>>
>

Re: Re: Sparql Query

Posted by Matt Whitby <ma...@gmail.com>.

Jena has a text index?

On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
buehmann@informatik.uni-leipzig.de> wrote:

> Even if it's not the strings leading to performance issues, using the
> Jena text index might be definitely more efficient
>
> On 08.12.21 10:38, Matt Whitby wrote:
> > Fuseki. No inference. TDB2.
> >
> > M
> >
> > On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
> >
> >> Lots of questions! Details matter!!
> >>
> >> On 08/12/2021 09:05, Matt Whitby wrote:
> >>> It's hosted in a container in Azure.
> >> (Jena storage layer)
> >>
> >> Using TDB1? TDB2?
> >>
> >>> I test it via Postman (though we're writing a RESTFul API to sit on
> top).
> >> So this is Fuseki? Is there any inference being used?
> >>
> >>       Andy
> >>
> >>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
> >>>
> >>>> Hi Matt,
> >>>>
> >>>> That query does not look couple-of-minutes expensive.
> >>>>
> >>>> Could you run it removing parts to see what happens? e.g. Remove one
> >>>> OPTIONAL and it's associated part of the filter.
> >>>>
> >>>> Which storage layer are you using?
> >>>>
> >>>>        Andy
> >>>>
> >>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
> >>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> >> wrote:
> >>>>> I dare say running an lcase against each field doesn't help matters,
> >> but
> >>>> with
> >>>>> no other way of doing a case-insensitive search (well, Regex - but
> who
> >>>> likes
> >>>>> Regex?) I'm not sure.
> >>>>>
> >>>>>
> >>>>> On this point alone, if it does turn out that string processing is
> what
> >>>> is
> >>>>> costing you time, you might adjust your data to include a convenience
> >>>>> property with county, district, and parish in lowercase. Then you
> could
> >>>> do
> >>>>> a more direct (and cheaper) match.
> >>>>>
> >>>>> That having been said, it seems unlikely to me that timed-out queries
> >> are
> >>>>> due to something as cheap as lowercasing. Have you tried peeling off
> >> some
> >>>>> of those OPTIONALs to see how much they cost?
> >>>>>
> >>>>> Adam
> >>>>>
> >>>>>
> >>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> >> wrote:
> >>>>>> I have a Sparql question if that's okay.
> >>>>>>
> >>>>>> There are only around 8m triples in our test data, so pretty small.
> >>>>>>
> >>>>>> The query takes a good couple of minutes to run (and sometimes just
> >>>> times
> >>>>>> out).
> >>>>>>
> >>>>>> I dare say running an lcase against each field doesn't help matters,
> >> but
> >>>>>> with no other way of doing a case-insensitive search (well, Regex -
> >> but
> >>>> who
> >>>>>> likes Regex?) I'm not sure.
> >>>>>>
> >>>>>> Any obvious ways to make it less bad?
> >>>>>>
> >>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>>>> select ?s ?name
> >>>>>> where {
> >>>>>>
> >>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
> >>>> ?name .
> >>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
> >>>>>> ?county}.
> >>>>>> OPTIONAL {?s <
> http://www.historicengland.org.uk/data/schema/district/
> >>>>>> ?district}.
> >>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
> >>>>>> ?parish}.
> >>>>>>
> >>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> >>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
> >>>>>>
> >>>>>> }
> >>>>>> limit 10
> >>>>>>
> >>>
> >
>


-- 
Matt
Southend. Essex, England

Guff follows....

Me: http://www.about.me/matt.whitby


Photography: http://www.whitbyphoto.com


Travels: http://www.whitbyadventures.com


Music: http://www.last.fm/user/MattWhitby
<http://www.last.fm/user/MattWhitby/%3C/a%3E>


Reading: https://www.goodreads.com/user_challenges/19398505


Development: https://www.hackerrank.com/matt_whitby

Re: Re: Sparql Query

Posted by Lorenz Buehmann <bu...@informatik.uni-leipzig.de>.

Even if it's not the strings leading to performance issues, using the 
Jena text index might be definitely more efficient

On 08.12.21 10:38, Matt Whitby wrote:
> Fuseki. No inference. TDB2.
>
> M
>
> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
>
>> Lots of questions! Details matter!!
>>
>> On 08/12/2021 09:05, Matt Whitby wrote:
>>> It's hosted in a container in Azure.
>> (Jena storage layer)
>>
>> Using TDB1? TDB2?
>>
>>> I test it via Postman (though we're writing a RESTFul API to sit on top).
>> So this is Fuseki? Is there any inference being used?
>>
>>       Andy
>>
>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> That query does not look couple-of-minutes expensive.
>>>>
>>>> Could you run it removing parts to see what happens? e.g. Remove one
>>>> OPTIONAL and it's associated part of the filter.
>>>>
>>>> Which storage layer are you using?
>>>>
>>>>        Andy
>>>>
>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>> wrote:
>>>>> I dare say running an lcase against each field doesn't help matters,
>> but
>>>> with
>>>>> no other way of doing a case-insensitive search (well, Regex - but who
>>>> likes
>>>>> Regex?) I'm not sure.
>>>>>
>>>>>
>>>>> On this point alone, if it does turn out that string processing is what
>>>> is
>>>>> costing you time, you might adjust your data to include a convenience
>>>>> property with county, district, and parish in lowercase. Then you could
>>>> do
>>>>> a more direct (and cheaper) match.
>>>>>
>>>>> That having been said, it seems unlikely to me that timed-out queries
>> are
>>>>> due to something as cheap as lowercasing. Have you tried peeling off
>> some
>>>>> of those OPTIONALs to see how much they cost?
>>>>>
>>>>> Adam
>>>>>
>>>>>
>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>> wrote:
>>>>>> I have a Sparql question if that's okay.
>>>>>>
>>>>>> There are only around 8m triples in our test data, so pretty small.
>>>>>>
>>>>>> The query takes a good couple of minutes to run (and sometimes just
>>>> times
>>>>>> out).
>>>>>>
>>>>>> I dare say running an lcase against each field doesn't help matters,
>> but
>>>>>> with no other way of doing a case-insensitive search (well, Regex -
>> but
>>>> who
>>>>>> likes Regex?) I'm not sure.
>>>>>>
>>>>>> Any obvious ways to make it less bad?
>>>>>>
>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>>> select ?s ?name
>>>>>> where {
>>>>>>
>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
>>>> ?name .
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>>>>>> ?county}.
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/
>>>>>> ?district}.
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>>>>>> ?parish}.
>>>>>>
>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>>>
>>>>>> }
>>>>>> limit 10
>>>>>>
>>>
>

Re: Sparql Query

Posted by Andy Seaborne <an...@apache.org>.

Is the data available somewhere?

On 08/12/2021 09:38, Matt Whitby wrote:
> Fuseki. No inference. TDB2.
> 
> M
> 
> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:
> 
>> Lots of questions! Details matter!!
>>
>> On 08/12/2021 09:05, Matt Whitby wrote:
>>> It's hosted in a container in Azure.
>>
>> (Jena storage layer)
>>
>> Using TDB1? TDB2?
>>
>>> I test it via Postman (though we're writing a RESTFul API to sit on top).
>>
>> So this is Fuseki? Is there any inference being used?
>>
>>       Andy
>>
>>>
>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> That query does not look couple-of-minutes expensive.
>>>>
>>>> Could you run it removing parts to see what happens? e.g. Remove one
>>>> OPTIONAL and it's associated part of the filter.
>>>>
>>>> Which storage layer are you using?
>>>>
>>>>        Andy
>>>>
>>>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>> wrote:
>>>>>
>>>>> I dare say running an lcase against each field doesn't help matters,
>> but
>>>> with
>>>>> no other way of doing a case-insensitive search (well, Regex - but who
>>>> likes
>>>>> Regex?) I'm not sure.
>>>>>
>>>>>
>>>>> On this point alone, if it does turn out that string processing is what
>>>> is
>>>>> costing you time, you might adjust your data to include a convenience
>>>>> property with county, district, and parish in lowercase. Then you could
>>>> do
>>>>> a more direct (and cheaper) match.
>>>>>
>>>>> That having been said, it seems unlikely to me that timed-out queries
>> are
>>>>> due to something as cheap as lowercasing. Have you tried peeling off
>> some
>>>>> of those OPTIONALs to see how much they cost?
>>>>>
>>>>> Adam
>>>>>
>>>>>
>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
>> wrote:
>>>>>
>>>>>> I have a Sparql question if that's okay.
>>>>>>
>>>>>> There are only around 8m triples in our test data, so pretty small.
>>>>>>
>>>>>> The query takes a good couple of minutes to run (and sometimes just
>>>> times
>>>>>> out).
>>>>>>
>>>>>> I dare say running an lcase against each field doesn't help matters,
>> but
>>>>>> with no other way of doing a case-insensitive search (well, Regex -
>> but
>>>> who
>>>>>> likes Regex?) I'm not sure.
>>>>>>
>>>>>> Any obvious ways to make it less bad?
>>>>>>
>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>>> select ?s ?name
>>>>>> where {
>>>>>>
>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
>>>> ?name .
>>>>>>
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>>>>>> ?county}.
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/
>>>
>>>>>> ?district}.
>>>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>>>>>> ?parish}.
>>>>>>
>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>>>
>>>>>> }
>>>>>> limit 10
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>

Re: Sparql Query

Posted by Matt Whitby <ma...@gmail.com>.

Fuseki. No inference. TDB2.

M

On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <an...@apache.org> wrote:

> Lots of questions! Details matter!!
>
> On 08/12/2021 09:05, Matt Whitby wrote:
> > It's hosted in a container in Azure.
>
> (Jena storage layer)
>
> Using TDB1? TDB2?
>
> > I test it via Postman (though we're writing a RESTFul API to sit on top).
>
> So this is Fuseki? Is there any inference being used?
>
>      Andy
>
> >
> > On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
> >
> >> Hi Matt,
> >>
> >> That query does not look couple-of-minutes expensive.
> >>
> >> Could you run it removing parts to see what happens? e.g. Remove one
> >> OPTIONAL and it's associated part of the filter.
> >>
> >> Which storage layer are you using?
> >>
> >>       Andy
> >>
> >> On 07/12/2021 20:18, ajs6f@apache.org wrote:
> >>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> wrote:
> >>>
> >>> I dare say running an lcase against each field doesn't help matters,
> but
> >> with
> >>> no other way of doing a case-insensitive search (well, Regex - but who
> >> likes
> >>> Regex?) I'm not sure.
> >>>
> >>>
> >>> On this point alone, if it does turn out that string processing is what
> >> is
> >>> costing you time, you might adjust your data to include a convenience
> >>> property with county, district, and parish in lowercase. Then you could
> >> do
> >>> a more direct (and cheaper) match.
> >>>
> >>> That having been said, it seems unlikely to me that timed-out queries
> are
> >>> due to something as cheap as lowercasing. Have you tried peeling off
> some
> >>> of those OPTIONALs to see how much they cost?
> >>>
> >>> Adam
> >>>
> >>>
> >>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com>
> wrote:
> >>>
> >>>> I have a Sparql question if that's okay.
> >>>>
> >>>> There are only around 8m triples in our test data, so pretty small.
> >>>>
> >>>> The query takes a good couple of minutes to run (and sometimes just
> >> times
> >>>> out).
> >>>>
> >>>> I dare say running an lcase against each field doesn't help matters,
> but
> >>>> with no other way of doing a case-insensitive search (well, Regex -
> but
> >> who
> >>>> likes Regex?) I'm not sure.
> >>>>
> >>>> Any obvious ways to make it less bad?
> >>>>
> >>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>> select ?s ?name
> >>>> where {
> >>>>
> >>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
> >> ?name .
> >>>>
> >>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
> >>>> ?county}.
> >>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/
> >
> >>>> ?district}.
> >>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
> >>>> ?parish}.
> >>>>
> >>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> >>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
> >>>>
> >>>> }
> >>>> limit 10
> >>>>
> >>>
> >>
> >
> >
>


-- 
Matt
Southend. Essex, England

Guff follows....

Me: http://www.about.me/matt.whitby


Photography: http://www.whitbyphoto.com


Travels: http://www.whitbyadventures.com


Music: http://www.last.fm/user/MattWhitby
<http://www.last.fm/user/MattWhitby/%3C/a%3E>


Reading: https://www.goodreads.com/user_challenges/19398505


Development: https://www.hackerrank.com/matt_whitby

Re: Sparql Query

Posted by Andy Seaborne <an...@apache.org>.

Lots of questions! Details matter!!

On 08/12/2021 09:05, Matt Whitby wrote:
> It's hosted in a container in Azure.

(Jena storage layer)

Using TDB1? TDB2?

> I test it via Postman (though we're writing a RESTFul API to sit on top).

So this is Fuseki? Is there any inference being used?

     Andy

> 
> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:
> 
>> Hi Matt,
>>
>> That query does not look couple-of-minutes expensive.
>>
>> Could you run it removing parts to see what happens? e.g. Remove one
>> OPTIONAL and it's associated part of the filter.
>>
>> Which storage layer are you using?
>>
>>       Andy
>>
>> On 07/12/2021 20:18, ajs6f@apache.org wrote:
>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
>>>
>>> I dare say running an lcase against each field doesn't help matters, but
>> with
>>> no other way of doing a case-insensitive search (well, Regex - but who
>> likes
>>> Regex?) I'm not sure.
>>>
>>>
>>> On this point alone, if it does turn out that string processing is what
>> is
>>> costing you time, you might adjust your data to include a convenience
>>> property with county, district, and parish in lowercase. Then you could
>> do
>>> a more direct (and cheaper) match.
>>>
>>> That having been said, it seems unlikely to me that timed-out queries are
>>> due to something as cheap as lowercasing. Have you tried peeling off some
>>> of those OPTIONALs to see how much they cost?
>>>
>>> Adam
>>>
>>>
>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
>>>
>>>> I have a Sparql question if that's okay.
>>>>
>>>> There are only around 8m triples in our test data, so pretty small.
>>>>
>>>> The query takes a good couple of minutes to run (and sometimes just
>> times
>>>> out).
>>>>
>>>> I dare say running an lcase against each field doesn't help matters, but
>>>> with no other way of doing a case-insensitive search (well, Regex - but
>> who
>>>> likes Regex?) I'm not sure.
>>>>
>>>> Any obvious ways to make it less bad?
>>>>
>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>> select ?s ?name
>>>> where {
>>>>
>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
>> ?name .
>>>>
>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>>>> ?county}.
>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
>>>> ?district}.
>>>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>>>> ?parish}.
>>>>
>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>>>
>>>> }
>>>> limit 10
>>>>
>>>
>>
> 
>

Re: Sparql Query

Posted by Matt Whitby <ma...@gmail.com>.

It's hosted in a container in Azure.
I test it via Postman (though we're writing a RESTFul API to sit on top).

On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <an...@apache.org> wrote:

> Hi Matt,
>
> That query does not look couple-of-minutes expensive.
>
> Could you run it removing parts to see what happens? e.g. Remove one
> OPTIONAL and it's associated part of the filter.
>
> Which storage layer are you using?
>
>      Andy
>
> On 07/12/2021 20:18, ajs6f@apache.org wrote:
> > On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
> >
> > I dare say running an lcase against each field doesn't help matters, but
> with
> > no other way of doing a case-insensitive search (well, Regex - but who
> likes
> > Regex?) I'm not sure.
> >
> >
> > On this point alone, if it does turn out that string processing is what
> is
> > costing you time, you might adjust your data to include a convenience
> > property with county, district, and parish in lowercase. Then you could
> do
> > a more direct (and cheaper) match.
> >
> > That having been said, it seems unlikely to me that timed-out queries are
> > due to something as cheap as lowercasing. Have you tried peeling off some
> > of those OPTIONALs to see how much they cost?
> >
> > Adam
> >
> >
> > On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
> >
> >> I have a Sparql question if that's okay.
> >>
> >> There are only around 8m triples in our test data, so pretty small.
> >>
> >> The query takes a good couple of minutes to run (and sometimes just
> times
> >> out).
> >>
> >> I dare say running an lcase against each field doesn't help matters, but
> >> with no other way of doing a case-insensitive search (well, Regex - but
> who
> >> likes Regex?) I'm not sure.
> >>
> >> Any obvious ways to make it less bad?
> >>
> >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >> select ?s ?name
> >> where {
> >>
> >> ?s <http://www.historicengland.org.uk/data/schema/simplename/name>
> ?name .
> >>
> >> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
> >> ?county}.
> >> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
> >> ?district}.
> >> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
> >> ?parish}.
> >>
> >> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> >> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
> >>
> >> }
> >> limit 10
> >>
> >
>


-- 
Matt
Southend. Essex, England

Guff follows....

Me: http://www.about.me/matt.whitby


Photography: http://www.whitbyphoto.com


Travels: http://www.whitbyadventures.com


Music: http://www.last.fm/user/MattWhitby
<http://www.last.fm/user/MattWhitby/%3C/a%3E>


Reading: https://www.goodreads.com/user_challenges/19398505


Development: https://www.hackerrank.com/matt_whitby

Re: Sparql Query

Posted by Andy Seaborne <an...@apache.org>.

Hi Matt,

That query does not look couple-of-minutes expensive.

Could you run it removing parts to see what happens? e.g. Remove one 
OPTIONAL and it's associated part of the filter.

Which storage layer are you using?

     Andy

On 07/12/2021 20:18, ajs6f@apache.org wrote:
> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
> 
> I dare say running an lcase against each field doesn't help matters, but with
> no other way of doing a case-insensitive search (well, Regex - but who likes
> Regex?) I'm not sure.
> 
> 
> On this point alone, if it does turn out that string processing is what is
> costing you time, you might adjust your data to include a convenience
> property with county, district, and parish in lowercase. Then you could do
> a more direct (and cheaper) match.
> 
> That having been said, it seems unlikely to me that timed-out queries are
> due to something as cheap as lowercasing. Have you tried peeling off some
> of those OPTIONALs to see how much they cost?
> 
> Adam
> 
> 
> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:
> 
>> I have a Sparql question if that's okay.
>>
>> There are only around 8m triples in our test data, so pretty small.
>>
>> The query takes a good couple of minutes to run (and sometimes just times
>> out).
>>
>> I dare say running an lcase against each field doesn't help matters, but
>> with no other way of doing a case-insensitive search (well, Regex - but who
>> likes Regex?) I'm not sure.
>>
>> Any obvious ways to make it less bad?
>>
>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>> select ?s ?name
>> where {
>>
>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name> ?name .
>>
>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
>> ?county}.
>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
>> ?district}.
>> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
>> ?parish}.
>>
>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>>
>> }
>> limit 10
>>
>

Re: Sparql Query

Posted by aj...@apache.org.

On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:

I dare say running an lcase against each field doesn't help matters, but with
no other way of doing a case-insensitive search (well, Regex - but who likes
Regex?) I'm not sure.

On this point alone, if it does turn out that string processing is what is
costing you time, you might adjust your data to include a convenience
property with county, district, and parish in lowercase. Then you could do
a more direct (and cheaper) match.

That having been said, it seems unlikely to me that timed-out queries are
due to something as cheap as lowercasing. Have you tried peeling off some
of those OPTIONALs to see how much they cost?

Adam

On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <ma...@gmail.com> wrote:

> I have a Sparql question if that's okay.
>
> There are only around 8m triples in our test data, so pretty small.
>
> The query takes a good couple of minutes to run (and sometimes just times
> out).
>
> I dare say running an lcase against each field doesn't help matters, but
> with no other way of doing a case-insensitive search (well, Regex - but who
> likes Regex?) I'm not sure.
>
> Any obvious ways to make it less bad?
>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> select ?s ?name
> where {
>
> ?s <http://www.historicengland.org.uk/data/schema/simplename/name> ?name .
>
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
> ?county}.
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
> ?district}.
> OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
> ?parish}.
>
> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
>
> }
> limit 10
>