You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Kurt Laninga <st...@gmail.com> on 2013/11/06 22:22:54 UTC

Sharing an in-memory Dataset between threads

Is it possible to share an in memory Dataset between threads?
According to the Jena documentation on the Web site I can use two threads
to access the same Dataset that has a TDB backend like this:

Thread 1:

 Dataset dataset = TDBFactory.createDataset(location) ;
 dataset.begin(ReadWrite.WRITE) ;
 try {
   ...
   dataset.commit() ;
 } finally { dataset.end() ; }

Thread 2:

 Dataset dataset = TDBFactory.createDataset(location) ;
 dataset.begin(ReadWrite.READ) ;
 try {
   ...
 } finally { dataset.end() ; }

Is it possible to do this with an in-memory dataset?

Re: Jena-text starts-wth

Posted by Osma Suominen <os...@helsinki.fi>.

Hi Wolfgang!

I have the same problem - I want to do fast prefix searches using SPARQL 
- and I've considered ways to implement it. Currently I'm just using a 
prefix query like "head*" in the text:query, and then filter the hits 
using a regex. (Thanks to Joshua for the strstarts trick, I'll have to 
try how much faster it is!)

The problem is that the StandardAnalyzer used by the jena-text Lucene 
index tokenizes the original literals into words, and Lucene then 
performs matches regardless of the position the word was in. So you will 
get matches even when the word (or prefix in this case) is not the first 
one in the literal, as in "DICOM Header Tag".

One way around this would be to tweak the jena-text Lucene 
implementation so that it doesn't tokenize the strings. Then I think you 
would get only real prefix matches. I haven't tried this with jena-text, 
but I've done similar things with plain Lucene in the past.

Currently jena-text is hardwired to use StandardAnalyzer with the 
default settings, you can't use anything else without altering the code. 
This was also a problem with LARQ and I've discussed it in the past on 
this list:
http://mail-archives.apache.org/mod_mbox/jena-users/201209.mbox/%3C50448B34.6050300@aalto.fi%3E

Another option would be to switch to using jena-text with Solr. This 
requires a bit more setting up as you have to run the Solr server daemon 
as well. But in Solr you can configure how the indexing is done using 
the schema.xml file, so you could easily ask it not to tokenize strings. 
I haven't tried this yet either, but it might be an option for you.

-Osma

On 14/11/13 17:40, hueyl16@aol.com wrote:
> Hi Andy,
>
> I tried "Head*" but it does not work like "starts-with".
>
> "Head*" matches "DICOM Header Tag", which just "Head" does not. So that behaves as expected.
>
> But it still does not solve my "starts-with" problem since "DICOM Header Tag" was returned as part of the results in the first place. I only want matches like "Head Carcinoma", "Head Injury" etc.
>
> I checked out the two links you sent before posting this question. The tutorial mentions starts-with using the asterisk, but it matches any word in the text that starts-with the search string which is not what I am looking for.
>
> How do I tell the text query that it should only look for matches at the start of the string? (like "^" in regex or strstarts).
>
> -Wolfgang
>
> -----Original Message-----
>
> From: Andy Seaborne <an...@apache.org>
> To: users <us...@jena.apache.org>
> Sent: Thu, Nov 14, 2013 3:44 pm
> Subject: Re: Jena-text starts-wth
>
>
> On 14/11/13 14:04, Joshua TAYLOR wrote:
>> On Thu, Nov 14, 2013 at 7:42 AM,  <hu...@aol.com> wrote:
>>>
>>> I am using the following query to get all concepts that start with the word
> "Head".
>>>
>>>
>>> PREFIX text: <http://jena.apache.org/text#>
>>> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>> PREFIX owl: <http://www.w3.org/2002/07/owl#>
>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>
>>> SELECT *
>>> WHERE {
>>> ?s text:query (nci:Preferred_Name 'Head') .
>>> ?s nci:Preferred_Name ?prefName .
>>> FILTER ( regex(?prefName, "^Head", "" ))
>>> }
>>>
>>>
>>> Is there a way of doing that in the text query itself without having to add a
> FILTER?
>>
>> Maybe the Jena Lucene combination can do something without a FILTER,
>> but I don't know much about that, and can't help you out there.  I
>> would point out, though, that you can make this FILTER less expensive
>> by using SPARQL 1.1's STRSTARTS:
>>
>>       filter( strstarts( str(?prefName), "Head" ))
>>
>>
>>
>>
>
> You can use the full Lucene query syntax:
>
>      ?s text:query (nci:Preferred_Name 'Head*') .
>
> http://www.lucenetutorial.com/lucene-query-syntax.html
> http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html
>
> on the default field.
>
> 	Andy
>
>
>
>
>
>

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Jena-text starts-wth

Posted by Andy Seaborne <an...@apache.org>.

See the links to the syntax of Lucene query strings -- if that works for 
you, you can use it.

Even so, if you need to both use text:query and a FILTER, then it's 
probably faster because there are less triples to consider.

	Andy

PS It helps to start a new thread, not reply to an existing one, when 
the topic is different.

On 14/11/13 15:40, hueyl16@aol.com wrote:
> Hi Andy,
>
> I tried "Head*" but it does not work like "starts-with".
>
> "Head*" matches "DICOM Header Tag", which just "Head" does not. So that behaves as expected.
>
> But it still does not solve my "starts-with" problem since "DICOM Header Tag" was returned as part of the results in the first place. I only want matches like "Head Carcinoma", "Head Injury" etc.
>
> I checked out the two links you sent before posting this question. The tutorial mentions starts-with using the asterisk, but it matches any word in the text that starts-with the search string which is not what I am looking for.
>
> How do I tell the text query that it should only look for matches at the start of the string? (like "^" in regex or strstarts).
>
> -Wolfgang
>
> -----Original Message-----
>
> From: Andy Seaborne <an...@apache.org>
> To: users <us...@jena.apache.org>
> Sent: Thu, Nov 14, 2013 3:44 pm
> Subject: Re: Jena-text starts-wth
>
>
> On 14/11/13 14:04, Joshua TAYLOR wrote:
>> On Thu, Nov 14, 2013 at 7:42 AM,  <hu...@aol.com> wrote:
>>>
>>> I am using the following query to get all concepts that start with the word
> "Head".
>>>
>>>
>>> PREFIX text: <http://jena.apache.org/text#>
>>> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>> PREFIX owl: <http://www.w3.org/2002/07/owl#>
>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>
>>> SELECT *
>>> WHERE {
>>> ?s text:query (nci:Preferred_Name 'Head') .
>>> ?s nci:Preferred_Name ?prefName .
>>> FILTER ( regex(?prefName, "^Head", "" ))
>>> }
>>>
>>>
>>> Is there a way of doing that in the text query itself without having to add a
> FILTER?
>>
>> Maybe the Jena Lucene combination can do something without a FILTER,
>> but I don't know much about that, and can't help you out there.  I
>> would point out, though, that you can make this FILTER less expensive
>> by using SPARQL 1.1's STRSTARTS:
>>
>>       filter( strstarts( str(?prefName), "Head" ))
>>
>>
>>
>>
>
> You can use the full Lucene query syntax:
>
>      ?s text:query (nci:Preferred_Name 'Head*') .
>
> http://www.lucenetutorial.com/lucene-query-syntax.html
> http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html
>
> on the default field.
>
> 	Andy
>
>
>
>
>
>

Re: Jena-text starts-wth

Posted by hu...@aol.com.

Hi Andy,

I tried "Head*" but it does not work like "starts-with".

"Head*" matches "DICOM Header Tag", which just "Head" does not. So that behaves as expected.

But it still does not solve my "starts-with" problem since "DICOM Header Tag" was returned as part of the results in the first place. I only want matches like "Head Carcinoma", "Head Injury" etc. 

I checked out the two links you sent before posting this question. The tutorial mentions starts-with using the asterisk, but it matches any word in the text that starts-with the search string which is not what I am looking for.

How do I tell the text query that it should only look for matches at the start of the string? (like "^" in regex or strstarts).

-Wolfgang

-----Original Message-----

From: Andy Seaborne <an...@apache.org>
To: users <us...@jena.apache.org>
Sent: Thu, Nov 14, 2013 3:44 pm
Subject: Re: Jena-text starts-wth

On 14/11/13 14:04, Joshua TAYLOR wrote:
> On Thu, Nov 14, 2013 at 7:42 AM,  <hu...@aol.com> wrote:
>>
>> I am using the following query to get all concepts that start with the word 
"Head".
>>
>>
>> PREFIX text: <http://jena.apache.org/text#>
>> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>> PREFIX owl: <http://www.w3.org/2002/07/owl#>
>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>
>> SELECT *
>> WHERE {
>> ?s text:query (nci:Preferred_Name 'Head') .
>> ?s nci:Preferred_Name ?prefName .
>> FILTER ( regex(?prefName, "^Head", "" ))
>> }
>>
>>
>> Is there a way of doing that in the text query itself without having to add a 
FILTER?
>
> Maybe the Jena Lucene combination can do something without a FILTER,
> but I don't know much about that, and can't help you out there.  I
> would point out, though, that you can make this FILTER less expensive
> by using SPARQL 1.1's STRSTARTS:
>
>      filter( strstarts( str(?prefName), "Head" ))
>
>
>
>

You can use the full Lucene query syntax:

    ?s text:query (nci:Preferred_Name 'Head*') .

http://www.lucenetutorial.com/lucene-query-syntax.html
http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html

on the default field.

	Andy

Re: Jena-text starts-wth

Posted by Andy Seaborne <an...@apache.org>.

On 14/11/13 14:04, Joshua TAYLOR wrote:
> On Thu, Nov 14, 2013 at 7:42 AM,  <hu...@aol.com> wrote:
>>
>> I am using the following query to get all concepts that start with the word "Head".
>>
>>
>> PREFIX text: <http://jena.apache.org/text#>
>> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>> PREFIX owl: <http://www.w3.org/2002/07/owl#>
>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>
>> SELECT *
>> WHERE {
>> ?s text:query (nci:Preferred_Name 'Head') .
>> ?s nci:Preferred_Name ?prefName .
>> FILTER ( regex(?prefName, "^Head", "" ))
>> }
>>
>>
>> Is there a way of doing that in the text query itself without having to add a FILTER?
>
> Maybe the Jena Lucene combination can do something without a FILTER,
> but I don't know much about that, and can't help you out there.  I
> would point out, though, that you can make this FILTER less expensive
> by using SPARQL 1.1's STRSTARTS:
>
>      filter( strstarts( str(?prefName), "Head" ))
>
>
>
>

You can use the full Lucene query syntax:

    ?s text:query (nci:Preferred_Name 'Head*') .

http://www.lucenetutorial.com/lucene-query-syntax.html
http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html

on the default field.

	Andy

Re: Jena-text starts-wth

Posted by Joshua TAYLOR <jo...@gmail.com>.

On Thu, Nov 14, 2013 at 7:42 AM,  <hu...@aol.com> wrote:
>
> I am using the following query to get all concepts that start with the word "Head".
>
>
> PREFIX text: <http://jena.apache.org/text#>
> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> PREFIX owl: <http://www.w3.org/2002/07/owl#>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>
> SELECT *
> WHERE {
> ?s text:query (nci:Preferred_Name 'Head') .
> ?s nci:Preferred_Name ?prefName .
> FILTER ( regex(?prefName, "^Head", "" ))
> }
>
>
> Is there a way of doing that in the text query itself without having to add a FILTER?

Maybe the Jena Lucene combination can do something without a FILTER,
but I don't know much about that, and can't help you out there.  I
would point out, though, that you can make this FILTER less expensive
by using SPARQL 1.1's STRSTARTS:

    filter( strstarts( str(?prefName), "Head" ))




-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

Jena-text starts-wth

Posted by hu...@aol.com.

Hi,

I am using the following query to get all concepts that start with the word "Head". 


PREFIX text: <http://jena.apache.org/text#> 
PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * 
WHERE { 
?s text:query (nci:Preferred_Name 'Head') . 
?s nci:Preferred_Name ?prefName . 
FILTER ( regex(?prefName, "^Head", "" ))  
}


Is there a way of doing that in the text query itself without having to add a FILTER? I read that adding an asterisk "Head*" would work like "starts-with", but that also matches anywhere in the string. And the regular expression start of string symbol "^" does not work either.

Regards,
Wolfgang

Re: Sharing an in-memory Dataset between threads

Posted by Andy Seaborne <an...@apache.org>.

Yes - but use a lock (dataset.getLock()), not transactions.  There is no 
"abort".

	Andy


On 06/11/13 21:22, Kurt Laninga wrote:
> Is it possible to share an in memory Dataset between threads?
> According to the Jena documentation on the Web site I can use two threads
> to access the same Dataset that has a TDB backend like this:
>
> Thread 1:
>
>   Dataset dataset = TDBFactory.createDataset(location) ;
>   dataset.begin(ReadWrite.WRITE) ;
>   try {
>     ...
>     dataset.commit() ;
>   } finally { dataset.end() ; }
>
> Thread 2:
>
>   Dataset dataset = TDBFactory.createDataset(location) ;
>   dataset.begin(ReadWrite.READ) ;
>   try {
>     ...
>   } finally { dataset.end() ; }
>
> Is it possible to do this with an in-memory dataset?
>