You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mark Wharton <ma...@iotic-labs.com> on 2016/03/10 15:11:13 UTC

Deleted triples still in the Lucene index?

Hi

Deleting triples still leaves them in the Lucene text search index.
This leads to some questions

1) Is this expected behaviour?
2) If it's not, what did I do wrong?
3) If it is, do I have to re-run jena.textindexer again to keep it
up-to-date?  This is a pain, because I have to stop fuseki in order to
do it.

Steps to replicate

I'm running:
Fuseki 2.3.0 2015-07-25T17:11:28+0000
Attached is my config to set up the spatial index

1a) Insert a triple into an empty database

INSERT DATA {
<urn:uuid:abc> rdfs:label  "abc"@en
}

1b) Query for it with the text label

SELECT ?s
{ ?s text:query "abc"  .
}

returns
------------------
| s              |
==================
| <urn:uuid:abc> |
------------------

1c) ASK if there are any triples with subject <urn:uuid:abc>
ASK
{
?s ?p ?o . filter(?s = <urn:uuid:abc>)
}

Returns "yes"

2a) Then delete it:
DELETE  {?s rdfs:label ?label}
WHERE
{
  ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
}

3a) And run the ASK again, returns "no"

3b) Run the text query again returns
------------------
| s              |
==================
| <urn:uuid:abc> |
------------------



-- 
Technology Lead, Iotic Labs
+44 7973 674404
mark.wharton@iotic-labs.com
https://www.iotic-labs.com

Re: Deleted triples still in the Lucene index?

Posted by Osma Suominen <os...@helsinki.fi>.
On 13/03/16 22:05, Andy Seaborne wrote:

> Please do submit a pull request with some changes.  Code does not grow
> on trees.  Open source is a low cost means of production but it still
> takes people-time from somewhere.

For the record, here is the PR that implemented the delete handling in 
jena-text: https://github.com/apache/jena/pull/53

I'm not very familiar with jena-spatial code but I think a similar 
feature could be added there - perhaps sharing some of the code as well.

-Osma

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Deleted triples still in the Lucene index?

Posted by Andy Seaborne <an...@apache.org>.
On 13/03/16 07:56, Mark Wharton wrote:
> Hi Andy.
>
> That's not quite what I wanted to hear (British understatement).  We're
> using Jena to keep track of things in an IoT solution.  As it stands if
> something changes location, then any search will have a "ghost image" of
> it where it used to be.

The text query documentation has this example:

SELECT ?s
{ ?s text:query (rdfs:label 'word' 10) ;
      rdfs:label ?label
}

where it does a text search then checks the retrieved ?s is still in the 
RDF.  Maintaining the index on a per triple basis is potentially very 
expensive and a scheme that does not do so, and have the query check the 
RDF (which may be needed anyway to get other info) may well perform 
better.  There could be a periodic rebuild of the index (offline) to GC.

>
> Have you any thoughts on when these changes might be propagated into
> spatial?

Please do submit a pull request with some changes.  Code does not grow 
on trees.  Open source is a low cost means of production but it still 
takes people-time from somewhere.

	Andy

https://en.wikipedia.org/wiki/English_understatement

>
> Mark
>
> Technology Lead, Iotic Labs
> +44 7973 674404
> mark.wharton@iotic-labs.com
> https://www.iotic-labs.com
>
> On 11/03/16 18:18, Andy Seaborne wrote:
>> Hi there,
>>
>>  From memory, those changes made to jena-text didn't get put into
>> jena-spatial.
>>
>> Now the functionality of each is clearer, what would be good is for
>> there to be a common framework with jena-text and jena-spatial sharing
>> as much as makes sense.
>>
>>      Andy
>>
>>
>> On 11/03/16 10:21, Mark Wharton wrote:
>>> Hi Rob.
>>>
>>> That worked a treat, thanks.
>>>
>>> One more question.  Is there a similar thing for the Spatial index ?
>>> I've tried adding:
>>>       spatial:uidField      "uid" ;
>>>
>>> to the spatial config file, but that doesn't seem to work. (attached)
>>>
>>> Mark
>>>
>>> Technology Lead, Iotic Labs
>>> +44 7973 674404
>>> mark.wharton@iotic-labs.com
>>> https://www.iotic-labs.com
>>>
>>> On 10/03/16 14:44, Rob Vesse wrote:
>>>> Mark
>>>>
>>>> 1 & 2)
>>>>
>>>> Yes this is the default behaviour
>>>>
>>>> 3)
>>>>
>>>> You can configure your index to support deletions as detailed at:
>>>>
>>>> https://jena.apache.org/documentation/query/text-query.html#deletion-of-ind
>>>>
>>>> exed-entities
>>>>
>>>> Specifically you need to add a text:uidField entry to your entity map
>>>> configuration
>>>>
>>>> If your existing index does not yet have this (which based on your
>>>> config
>>>> it does not) then you will need to rebuild the index in order for it to
>>>> support deletions going forward
>>>>
>>>> Rob
>>>>
>>>> On 10/03/2016 14:11, "Mark Wharton" <ma...@iotic-labs.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Deleting triples still leaves them in the Lucene text search index.
>>>>> This leads to some questions
>>>>>
>>>>> 1) Is this expected behaviour?
>>>>> 2) If it's not, what did I do wrong?
>>>>> 3) If it is, do I have to re-run jena.textindexer again to keep it
>>>>> up-to-date?  This is a pain, because I have to stop fuseki in order to
>>>>> do it.
>>>>>
>>>>> Steps to replicate
>>>>>
>>>>> I'm running:
>>>>> Fuseki 2.3.0 2015-07-25T17:11:28+0000
>>>>> Attached is my config to set up the spatial index
>>>>>
>>>>> 1a) Insert a triple into an empty database
>>>>>
>>>>> INSERT DATA {
>>>>> <urn:uuid:abc> rdfs:label  "abc"@en
>>>>> }
>>>>>
>>>>> 1b) Query for it with the text label
>>>>>
>>>>> SELECT ?s
>>>>> { ?s text:query "abc"  .
>>>>> }
>>>>>
>>>>> returns
>>>>> ------------------
>>>>> | s              |
>>>>> ==================
>>>>> | <urn:uuid:abc> |
>>>>> ------------------
>>>>>
>>>>> 1c) ASK if there are any triples with subject <urn:uuid:abc>
>>>>> ASK
>>>>> {
>>>>> ?s ?p ?o . filter(?s = <urn:uuid:abc>)
>>>>> }
>>>>>
>>>>> Returns "yes"
>>>>>
>>>>> 2a) Then delete it:
>>>>> DELETE  {?s rdfs:label ?label}
>>>>> WHERE
>>>>> {
>>>>>    ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
>>>>> }
>>>>>
>>>>> 3a) And run the ASK again, returns "no"
>>>>>
>>>>> 3b) Run the text query again returns
>>>>> ------------------
>>>>> | s              |
>>>>> ==================
>>>>> | <urn:uuid:abc> |
>>>>> ------------------
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Technology Lead, Iotic Labs
>>>>> +44 7973 674404
>>>>> mark.wharton@iotic-labs.com
>>>>> https://www.iotic-labs.com
>>>>
>>>>
>>>>
>>>>
>>
>


Re: Deleted triples still in the Lucene index?

Posted by Mark Wharton <ma...@iotic-labs.com>.
Hi Andy.

That's not quite what I wanted to hear (British understatement).  We're
using Jena to keep track of things in an IoT solution.  As it stands if
something changes location, then any search will have a "ghost image" of
it where it used to be.

Have you any thoughts on when these changes might be propagated into
spatial?

Mark

Technology Lead, Iotic Labs
+44 7973 674404
mark.wharton@iotic-labs.com
https://www.iotic-labs.com

On 11/03/16 18:18, Andy Seaborne wrote:
> Hi there,
> 
> From memory, those changes made to jena-text didn't get put into
> jena-spatial.
> 
> Now the functionality of each is clearer, what would be good is for
> there to be a common framework with jena-text and jena-spatial sharing
> as much as makes sense.
> 
>     Andy
> 
> 
> On 11/03/16 10:21, Mark Wharton wrote:
>> Hi Rob.
>>
>> That worked a treat, thanks.
>>
>> One more question.  Is there a similar thing for the Spatial index ?
>> I've tried adding:
>>      spatial:uidField      "uid" ;
>>
>> to the spatial config file, but that doesn't seem to work. (attached)
>>
>> Mark
>>
>> Technology Lead, Iotic Labs
>> +44 7973 674404
>> mark.wharton@iotic-labs.com
>> https://www.iotic-labs.com
>>
>> On 10/03/16 14:44, Rob Vesse wrote:
>>> Mark
>>>
>>> 1 & 2)
>>>
>>> Yes this is the default behaviour
>>>
>>> 3)
>>>
>>> You can configure your index to support deletions as detailed at:
>>>
>>> https://jena.apache.org/documentation/query/text-query.html#deletion-of-ind
>>>
>>> exed-entities
>>>
>>> Specifically you need to add a text:uidField entry to your entity map
>>> configuration
>>>
>>> If your existing index does not yet have this (which based on your
>>> config
>>> it does not) then you will need to rebuild the index in order for it to
>>> support deletions going forward
>>>
>>> Rob
>>>
>>> On 10/03/2016 14:11, "Mark Wharton" <ma...@iotic-labs.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Deleting triples still leaves them in the Lucene text search index.
>>>> This leads to some questions
>>>>
>>>> 1) Is this expected behaviour?
>>>> 2) If it's not, what did I do wrong?
>>>> 3) If it is, do I have to re-run jena.textindexer again to keep it
>>>> up-to-date?  This is a pain, because I have to stop fuseki in order to
>>>> do it.
>>>>
>>>> Steps to replicate
>>>>
>>>> I'm running:
>>>> Fuseki 2.3.0 2015-07-25T17:11:28+0000
>>>> Attached is my config to set up the spatial index
>>>>
>>>> 1a) Insert a triple into an empty database
>>>>
>>>> INSERT DATA {
>>>> <urn:uuid:abc> rdfs:label  "abc"@en
>>>> }
>>>>
>>>> 1b) Query for it with the text label
>>>>
>>>> SELECT ?s
>>>> { ?s text:query "abc"  .
>>>> }
>>>>
>>>> returns
>>>> ------------------
>>>> | s              |
>>>> ==================
>>>> | <urn:uuid:abc> |
>>>> ------------------
>>>>
>>>> 1c) ASK if there are any triples with subject <urn:uuid:abc>
>>>> ASK
>>>> {
>>>> ?s ?p ?o . filter(?s = <urn:uuid:abc>)
>>>> }
>>>>
>>>> Returns "yes"
>>>>
>>>> 2a) Then delete it:
>>>> DELETE  {?s rdfs:label ?label}
>>>> WHERE
>>>> {
>>>>   ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
>>>> }
>>>>
>>>> 3a) And run the ASK again, returns "no"
>>>>
>>>> 3b) Run the text query again returns
>>>> ------------------
>>>> | s              |
>>>> ==================
>>>> | <urn:uuid:abc> |
>>>> ------------------
>>>>
>>>>
>>>>
>>>> -- 
>>>> Technology Lead, Iotic Labs
>>>> +44 7973 674404
>>>> mark.wharton@iotic-labs.com
>>>> https://www.iotic-labs.com
>>>
>>>
>>>
>>>
> 


Re: Deleted triples still in the Lucene index?

Posted by Andy Seaborne <an...@apache.org>.
Hi there,

 From memory, those changes made to jena-text didn't get put into 
jena-spatial.

Now the functionality of each is clearer, what would be good is for 
there to be a common framework with jena-text and jena-spatial sharing 
as much as makes sense.

	Andy


On 11/03/16 10:21, Mark Wharton wrote:
> Hi Rob.
>
> That worked a treat, thanks.
>
> One more question.  Is there a similar thing for the Spatial index ?
> I've tried adding:
>      spatial:uidField      "uid" ;
>
> to the spatial config file, but that doesn't seem to work. (attached)
>
> Mark
>
> Technology Lead, Iotic Labs
> +44 7973 674404
> mark.wharton@iotic-labs.com
> https://www.iotic-labs.com
>
> On 10/03/16 14:44, Rob Vesse wrote:
>> Mark
>>
>> 1 & 2)
>>
>> Yes this is the default behaviour
>>
>> 3)
>>
>> You can configure your index to support deletions as detailed at:
>>
>> https://jena.apache.org/documentation/query/text-query.html#deletion-of-ind
>> exed-entities
>>
>> Specifically you need to add a text:uidField entry to your entity map
>> configuration
>>
>> If your existing index does not yet have this (which based on your config
>> it does not) then you will need to rebuild the index in order for it to
>> support deletions going forward
>>
>> Rob
>>
>> On 10/03/2016 14:11, "Mark Wharton" <ma...@iotic-labs.com> wrote:
>>
>>> Hi
>>>
>>> Deleting triples still leaves them in the Lucene text search index.
>>> This leads to some questions
>>>
>>> 1) Is this expected behaviour?
>>> 2) If it's not, what did I do wrong?
>>> 3) If it is, do I have to re-run jena.textindexer again to keep it
>>> up-to-date?  This is a pain, because I have to stop fuseki in order to
>>> do it.
>>>
>>> Steps to replicate
>>>
>>> I'm running:
>>> Fuseki 2.3.0 2015-07-25T17:11:28+0000
>>> Attached is my config to set up the spatial index
>>>
>>> 1a) Insert a triple into an empty database
>>>
>>> INSERT DATA {
>>> <urn:uuid:abc> rdfs:label  "abc"@en
>>> }
>>>
>>> 1b) Query for it with the text label
>>>
>>> SELECT ?s
>>> { ?s text:query "abc"  .
>>> }
>>>
>>> returns
>>> ------------------
>>> | s              |
>>> ==================
>>> | <urn:uuid:abc> |
>>> ------------------
>>>
>>> 1c) ASK if there are any triples with subject <urn:uuid:abc>
>>> ASK
>>> {
>>> ?s ?p ?o . filter(?s = <urn:uuid:abc>)
>>> }
>>>
>>> Returns "yes"
>>>
>>> 2a) Then delete it:
>>> DELETE  {?s rdfs:label ?label}
>>> WHERE
>>> {
>>>   ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
>>> }
>>>
>>> 3a) And run the ASK again, returns "no"
>>>
>>> 3b) Run the text query again returns
>>> ------------------
>>> | s              |
>>> ==================
>>> | <urn:uuid:abc> |
>>> ------------------
>>>
>>>
>>>
>>> --
>>> Technology Lead, Iotic Labs
>>> +44 7973 674404
>>> mark.wharton@iotic-labs.com
>>> https://www.iotic-labs.com
>>
>>
>>
>>


Re: Deleted triples still in the Lucene index?

Posted by Mark Wharton <ma...@iotic-labs.com>.
Hi Rob.

That worked a treat, thanks.

One more question.  Is there a similar thing for the Spatial index ?
I've tried adding:
    spatial:uidField      "uid" ;

to the spatial config file, but that doesn't seem to work. (attached)

Mark

Technology Lead, Iotic Labs
+44 7973 674404
mark.wharton@iotic-labs.com
https://www.iotic-labs.com

On 10/03/16 14:44, Rob Vesse wrote:
> Mark
> 
> 1 & 2)
> 
> Yes this is the default behaviour
> 
> 3)
> 
> You can configure your index to support deletions as detailed at:
> 
> https://jena.apache.org/documentation/query/text-query.html#deletion-of-ind
> exed-entities
> 
> Specifically you need to add a text:uidField entry to your entity map
> configuration
> 
> If your existing index does not yet have this (which based on your config
> it does not) then you will need to rebuild the index in order for it to
> support deletions going forward
> 
> Rob
> 
> On 10/03/2016 14:11, "Mark Wharton" <ma...@iotic-labs.com> wrote:
> 
>> Hi
>>
>> Deleting triples still leaves them in the Lucene text search index.
>> This leads to some questions
>>
>> 1) Is this expected behaviour?
>> 2) If it's not, what did I do wrong?
>> 3) If it is, do I have to re-run jena.textindexer again to keep it
>> up-to-date?  This is a pain, because I have to stop fuseki in order to
>> do it.
>>
>> Steps to replicate
>>
>> I'm running:
>> Fuseki 2.3.0 2015-07-25T17:11:28+0000
>> Attached is my config to set up the spatial index
>>
>> 1a) Insert a triple into an empty database
>>
>> INSERT DATA {
>> <urn:uuid:abc> rdfs:label  "abc"@en
>> }
>>
>> 1b) Query for it with the text label
>>
>> SELECT ?s
>> { ?s text:query "abc"  .
>> }
>>
>> returns
>> ------------------
>> | s              |
>> ==================
>> | <urn:uuid:abc> |
>> ------------------
>>
>> 1c) ASK if there are any triples with subject <urn:uuid:abc>
>> ASK
>> {
>> ?s ?p ?o . filter(?s = <urn:uuid:abc>)
>> }
>>
>> Returns "yes"
>>
>> 2a) Then delete it:
>> DELETE  {?s rdfs:label ?label}
>> WHERE
>> {
>>  ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
>> }
>>
>> 3a) And run the ASK again, returns "no"
>>
>> 3b) Run the text query again returns
>> ------------------
>> | s              |
>> ==================
>> | <urn:uuid:abc> |
>> ------------------
>>
>>
>>
>> -- 
>> Technology Lead, Iotic Labs
>> +44 7973 674404
>> mark.wharton@iotic-labs.com
>> https://www.iotic-labs.com
> 
> 
> 
> 

Re: Deleted triples still in the Lucene index?

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Mark

1 & 2)

Yes this is the default behaviour

3)

You can configure your index to support deletions as detailed at:

https://jena.apache.org/documentation/query/text-query.html#deletion-of-ind
exed-entities

Specifically you need to add a text:uidField entry to your entity map
configuration

If your existing index does not yet have this (which based on your config
it does not) then you will need to rebuild the index in order for it to
support deletions going forward

Rob

On 10/03/2016 14:11, "Mark Wharton" <ma...@iotic-labs.com> wrote:

>Hi
>
>Deleting triples still leaves them in the Lucene text search index.
>This leads to some questions
>
>1) Is this expected behaviour?
>2) If it's not, what did I do wrong?
>3) If it is, do I have to re-run jena.textindexer again to keep it
>up-to-date?  This is a pain, because I have to stop fuseki in order to
>do it.
>
>Steps to replicate
>
>I'm running:
>Fuseki 2.3.0 2015-07-25T17:11:28+0000
>Attached is my config to set up the spatial index
>
>1a) Insert a triple into an empty database
>
>INSERT DATA {
><urn:uuid:abc> rdfs:label  "abc"@en
>}
>
>1b) Query for it with the text label
>
>SELECT ?s
>{ ?s text:query "abc"  .
>}
>
>returns
>------------------
>| s              |
>==================
>| <urn:uuid:abc> |
>------------------
>
>1c) ASK if there are any triples with subject <urn:uuid:abc>
>ASK
>{
>?s ?p ?o . filter(?s = <urn:uuid:abc>)
>}
>
>Returns "yes"
>
>2a) Then delete it:
>DELETE  {?s rdfs:label ?label}
>WHERE
>{
>  ?s rdfs:label ?label . filter(?s = <urn:uuid:abc>)
>}
>
>3a) And run the ASK again, returns "no"
>
>3b) Run the text query again returns
>------------------
>| s              |
>==================
>| <urn:uuid:abc> |
>------------------
>
>
>
>-- 
>Technology Lead, Iotic Labs
>+44 7973 674404
>mark.wharton@iotic-labs.com
>https://www.iotic-labs.com