You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Kunal Jain <ku...@ezeiatech.com> on 2012/04/19 16:42:04 UTC
Filter regex queries are very slow
Dear All,
I am developing an application using Jena along with TDB store. I have
loaded around 4 million triples in my store. A small subset of my triples is
as follows:
<http://mycompany/ontolgy/ext/#AD> <http://mycompany/ontology#name> "AD" .
<http://mycompany/ontolgy/ext/#AD> <http://mycompany/ontology#type>
"category" .
<http://mycompany/ontolgy/ext/#AD1003> <http://mycompany/ontology#name>
"AD100" .
<http://mycompany/ontolgy/ext/#AD1003> <http://mycompany/ontology#type>
"product" .
<http://mycompany/ontolgy/ext/#AD1003> <http://mycompany/ontology#belongsTo>
<http://mycompany/ontolgy/ext/#AD> .
<http://mycompany/ontolgy/ext/#light1> <http://mycompany/ontology#name>
"Light" .
<http://mycompany/ontolgy/ext/#light1> <http://mycompany/ontology#type>
"item" .
<http://mycompany/ontolgy/ext/#light1> <http://mycompany/ontology#belongsTo>
<http://mycompany/ontolgy/ext/#AD1003> .
<http://mycompany/ontolgy/ext/#light1> <http://mycompany/ontology#value>
"42.5833" .
Now I want to do a free text matching for autosuggest kind functionality. I
have got this query to run against my store
PREFIX gs: <http://mycompany/ontolgy/ext/#>\n\n
PREFIX vs: <http://mycompany/ontology#>\n\n
SELECT ?subjectName
WHERE {
?subject vs:name ?subjectName
FILTER regex(?subjectName, \"^Light\", \"i\")
} LIMIT 10
I this query I am trying to find triples which start with a particular word,
i.e starting with 'Light'. Execution of this query is taking around 20
seconds.
I am using jena core 2.7, jena arq 2.9 and jena tdb 0.9.
Need help in figuring out how can this be optimized.
Thanks in advance.
Regards
Kunal
_____
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2411/4946 - Release Date: 04/19/12
Re: Filter regex queries are very slow
Posted by Paolo Castagna <ca...@googlemail.com>.
Kunal Jain wrote:
> Hi,
>
> Thanks for the information. I am trying to have LARQ integration in the
> application.
> Damian, have you released the latest version? Currently, version which is
> available is a bit old.
My fault, reading the documentation right now to make sure I follow the
correct process and I push the right buttons.
Paolo
>
> Thanks Again
>
> Kunal
>
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
> Sent: 20 April 2012 00:25
> To: jena-users@incubator.apache.org
> Subject: Re: Filter regex queries are very slow
>
> Hi
>
> Damian Steer wrote:
>> LARQ adds a proper free text index [1] which should be much better. This
> is now a separate module, I believe (Paolo?).
>> Personally I've used a separate trie index for autocompletion, since it
> tends to get hammered.
>
> Correct, LARQ is a separate module now (and I still need to finalize the
> 1.0.0 release which has been voted and approved). Apologies, I have not done
> it yet because it's my first time and I do not want to make mistakes (I
> still need to re-read the documentation).
>
> I'll do this over the week-end at the latest.
>
> Cheers,
> Paolo
>
>> [1] <http://incubator.apache.org/jena/documentation/larq/index.html>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.1913 / Virus Database: 2411/4953 - Release Date: 04/22/12
>
RE: Filter regex queries are very slow
Posted by Kunal Jain <ku...@ezeiatech.com>.
Hi,
Thanks for the information. I am trying to have LARQ integration in the
application.
Damian, have you released the latest version? Currently, version which is
available is a bit old.
Thanks Again
Kunal
-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com]
Sent: 20 April 2012 00:25
To: jena-users@incubator.apache.org
Subject: Re: Filter regex queries are very slow
Hi
Damian Steer wrote:
> LARQ adds a proper free text index [1] which should be much better. This
is now a separate module, I believe (Paolo?).
> Personally I've used a separate trie index for autocompletion, since it
tends to get hammered.
Correct, LARQ is a separate module now (and I still need to finalize the
1.0.0 release which has been voted and approved). Apologies, I have not done
it yet because it's my first time and I do not want to make mistakes (I
still need to re-read the documentation).
I'll do this over the week-end at the latest.
Cheers,
Paolo
> [1] <http://incubator.apache.org/jena/documentation/larq/index.html>
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2411/4953 - Release Date: 04/22/12
Re: Filter regex queries are very slow
Posted by Paolo Castagna <ca...@googlemail.com>.
Hi
Damian Steer wrote:
> LARQ adds a proper free text index [1] which should be much better. This is now a separate module, I believe (Paolo?).
> Personally I've used a separate trie index for autocompletion, since it tends to get hammered.
Correct, LARQ is a separate module now (and I still need to finalize the 1.0.0
release which has been voted and approved). Apologies, I have not done it yet
because it's my first time and I do not want to make mistakes (I still need to
re-read the documentation).
I'll do this over the week-end at the latest.
Cheers,
Paolo
> [1] <http://incubator.apache.org/jena/documentation/larq/index.html>
Re: Filter regex queries are very slow
Posted by Damian Steer <d....@bristol.ac.uk>.
On 19 Apr 2012, at 16:48, Rob Vesse wrote:
> Trie indexes are the simplest way to do prefix searches for auto-completion but they require you to do all the implementation yourselves because AFAIK we don't have a drop in Trie index module for ARQ
>
> Damian - Is your Trie index code you could share?
>
> Rob
The trie is part of a simple autocompletor web application that talks to sparql endpoints, rather than an ARQ-accessible index.The trie code isn't mine, and the original seems to have vanished from the web.
I could put it up on github if anyone is interested.
Damian
Re: Filter regex queries are very slow
Posted by Rob Vesse <ra...@ecs.soton.ac.uk>.
+1 to both suggestions
LARQ has the benefit that it gives you proper full text search so you
can do more than simple auto-completion (rankings, result limits, full
lucene queries) and that it is a relatively easy to drop in module
Trie indexes are the simplest way to do prefix searches for
auto-completion but they require you to do all the implementation
yourselves because AFAIK we don't have a drop in Trie index module for ARQ
Damian - Is your Trie index code you could share?
Rob
On 4/19/12 8:12 AM, Damian Steer wrote:
> On 19 Apr 2012, at 15:42, Kunal Jain wrote:
>
>> Dear All,
> Hi there,
>
>> I am developing an application using Jena along with TDB store. I have
>> loaded around 4 million triples in my store. A small subset of my triples is
>> as follows:
> ...
>
>> Now I want to do a free text matching for autosuggest kind functionality. I
>> have got this query to run against my store
>> ?subject vs:name ?subjectName
>>
>> FILTER regex(?subjectName, \"^Light\", \"i\")
>> I this query I am trying to find triples which start with a particular word,
>> i.e starting with 'Light'. Execution of this query is taking around 20
>> seconds.
> While it's possible this could be improved, it's never going to be great since the search is unindexed.
>
> LARQ adds a proper free text index [1] which should be much better. This is now a separate module, I believe (Paolo?).
> Personally I've used a separate trie index for autocompletion, since it tends to get hammered.
>
> Damian
>
> [1]<http://incubator.apache.org/jena/documentation/larq/index.html>
Re: Filter regex queries are very slow
Posted by Damian Steer <d....@bristol.ac.uk>.
On 19 Apr 2012, at 15:42, Kunal Jain wrote:
> Dear All,
Hi there,
> I am developing an application using Jena along with TDB store. I have
> loaded around 4 million triples in my store. A small subset of my triples is
> as follows:
...
> Now I want to do a free text matching for autosuggest kind functionality. I
> have got this query to run against my store
> ?subject vs:name ?subjectName
>
> FILTER regex(?subjectName, \"^Light\", \"i\")
> I this query I am trying to find triples which start with a particular word,
> i.e starting with 'Light'. Execution of this query is taking around 20
> seconds.
While it's possible this could be improved, it's never going to be great since the search is unindexed.
LARQ adds a proper free text index [1] which should be much better. This is now a separate module, I believe (Paolo?).
Personally I've used a separate trie index for autocompletion, since it tends to get hammered.
Damian
[1] <http://incubator.apache.org/jena/documentation/larq/index.html>