You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Laura Morales <la...@mail.com> on 2017/04/11 09:16:55 UTC

Jena native store indexes

With RDBMSes, indexes are a bit topic and should always be taken into consideration when writing queries. With RDF stores instead, I barely see them mentioned at all. So I was wondering how indexes work in Jena native store, or other native stores in general. When writing SPARQL queries, should I be aware of any particular index? Should I create new indexes myself (how)?

Re: Jena native store indexes

Posted by "A. Soroka" <aj...@virginia.edu>.

The Jena list can't really answer questions about "any RDF store", but for TDB, you begin with basic covering indexes, so you do not need to add anything (in fact you cannot add anything) to provide more indexing for standard SPARQL forms.

As has been pointed out, there are _extensions_ to SPARQL provided by Jena that can make use of additional indexes:

https://jena.apache.org/documentation/query/text-query.html

and

https://jena.apache.org/documentation/query/spatial-query.html
---
A. Soroka
The University of Virginia Library

> On Apr 11, 2017, at 1:30 PM, Laura Morales <la...@mail.com> wrote:
> 
> But is Jena (or any RDF store for what matters) expected to perform well even if I don't explicitly add any index?
> 
> 
>> You 'can' create text-indexes for selected properties of your data for
>> text search with a much better performance:
>> 
>> https://jena.apache.org/documentation/query/text-query.html

Re: Jena native store indexes

Posted by ba...@gmail.com.

On Mon, 24 Apr 2017 15:05:19 +0200, Martynas Jusevi\u010dius  
<ma...@graphity.org> wrote:

> "Should have been, could have been". It is how it is, your opinion is  
> just
> one of many and you will achieve nothing by complaining on this list. Go
> create a W3C Community Group and initiate some real work to achieve the
> standardisation that you think is required.

You 'fundamentally misunderstand' what i want to achieve: A bit  
background-info about problems waiting in my backhead since so many years  
for answers and for me the comments of Rob or Andy has been very  
informative how they think and vice versa they got how a user thinks about  
this and that in the context of this thread.

The rest sounds to me like a posting to 'Army Times' though i miss '!' at  
the end.

baran

***********

>
> On Mon, 24 Apr 2017 at 13.30, <ba...@gmail.com> wrote:
>
>>
>> Hello,
>>
>> > You seem to fundamentally misunderstand how the standardisation  
>> process
>> > works.
>>
>> The point is not whether i understand standardisation or not, the point  
>> is
>> your argument
>>
>> > ....  At the time that SPARQL 1.1 was standardised indexing was not a
>> > widely used extension so there was no impetus to standardise it.
>>
>> No supply, no demand. The torture creating for each property  
>> text-indexing
>> out of SPARQL syntax and than beeing even not compatible to other SPARQL
>> implementations yields no statistical statement whether text-indexing  
>> has
>> been widely used or not.
>>
>> In my posting i pointed up, text-indexing should have had top priority
>> starting from scratch to develope a query language for Semantic Web
>> environment, you don't think so and this has nothing to do with
>> 'fundamental' knowledge of a user, this has something to do setting
>> different priorities.
>>
>> Where SPARQL is now relating to text-indexing, this is 'fundamentally'  
>> not
>> acceptable for me. And you seem to be 'fundamentally' satisfied...
>>
>> baran
>>
>> *************
>>
>>
>>
>>   One might imagine that a future round of standardisation
>> > would choose to consider this as one candidate for a new feature in a
>> > future  Version of the standard.
>> >
>> > Rob
>> >
>> > On 22/04/2017 11:02, "baran.ha@gmail.com" <ba...@gmail.com> wrote:
>> >
>> >     ...(text search with text-indexing) cannot be offically expressed  
>> in
>> >     SPARQL.
>> >    I don't think Jena Development was responsible for this, but i  
>> assume
>> > they
>> >     know who and i as a user want also know who is in the history of
>> > SPARQL
>> >     development responsible for this idiocy...
>> >
>> >
>> >
>> >
>>
>>
>> --
>> Using Opera's mail client: http://www.opera.com/mail/
>>


-- 
Using Opera's mail client: http://www.opera.com/mail/

Re: Jena native store indexes

Posted by Martynas Jusevičius <ma...@graphity.org>.

"Should have been, could have been". It is how it is, your opinion is just
one of many and you will achieve nothing by complaining on this list. Go
create a W3C Community Group and initiate some real work to achieve the
standardisation that you think is required.

On Mon, 24 Apr 2017 at 13.30, <ba...@gmail.com> wrote:

>
> Hello,
>
> > You seem to fundamentally misunderstand how the standardisation process
> > works.
>
> The point is not whether i understand standardisation or not, the point is
> your argument
>
> > ....  At the time that SPARQL 1.1 was standardised indexing was not a
> > widely used extension so there was no impetus to standardise it.
>
> No supply, no demand. The torture creating for each property text-indexing
> out of SPARQL syntax and than beeing even not compatible to other SPARQL
> implementations yields no statistical statement whether text-indexing has
> been widely used or not.
>
> In my posting i pointed up, text-indexing should have had top priority
> starting from scratch to develope a query language for Semantic Web
> environment, you don't think so and this has nothing to do with
> 'fundamental' knowledge of a user, this has something to do setting
> different priorities.
>
> Where SPARQL is now relating to text-indexing, this is 'fundamentally' not
> acceptable for me. And you seem to be 'fundamentally' satisfied...
>
> baran
>
> *************
>
>
>
>   One might imagine that a future round of standardisation
> > would choose to consider this as one candidate for a new feature in a
> > future  Version of the standard.
> >
> > Rob
> >
> > On 22/04/2017 11:02, "baran.ha@gmail.com" <ba...@gmail.com> wrote:
> >
> >     ...(text search with text-indexing) cannot be offically expressed in
> >     SPARQL.
> >    I don't think Jena Development was responsible for this, but i assume
> > they
> >     know who and i as a user want also know who is in the history of
> > SPARQL
> >     development responsible for this idiocy...
> >
> >
> >
> >
>
>
> --
> Using Opera's mail client: http://www.opera.com/mail/
>

Re: Jena native store indexes

Posted by james anderson <ja...@dydra.com>.

good morning;

> On 2017-04-26, at 10:33, baran.ha@gmail.com wrote:
> 
> […]
> 
> Is it so far fetched thinking to standardise 'only' the SPARQL 'syntax' for text-indexing?

that would not resolve your complaint.
there are proposals which use facilities described in the recommendation to extend bgp entailment by associating alternative matching mechanisms with particular (combinations of) predicates.
with those, the syntax need not change, but the problem shifts to agreeing on the entailment regime.

best regards, from berlin,
---
james anderson | james@dydra.com | http://dydra.com

Re: Jena native store indexes

Posted by ba...@gmail.com.

Hello,

you generelized the problem of standardisating suggesting to standardise  
first the extensions as an important step to the mean standardisation, i  
think. To formulate similar things are essentially important for all users  
and very stimulating. Since so many years the first eMail i printed and  
ready to further study on my desk for next days.

The next thing text-indexing: I have had nothing to do with text-indexing  
implementations, but i can imagine what a 'huge' problem it can be NOW to  
'standardise' it for SPARQL. Otherwise, on Fuseki side i can add  
text-indexing for 'all' interested properties and on the Virtuoso side i  
can use bif:contains extension:

Is it so far fetched thinking to standardise 'only' the SPARQL 'syntax'  
for text-indexing?

Thank you very much, baran

PS: Since about 6-7 years a am on the other side of this environment:  
Querying UI for public endpoints fluently changing from this to that one,  
pure HTML+Javascript thing...

*******

On Tue, 25 Apr 2017 12:37:42 +0200, Rob Vesse <rv...@dotnetrdf.org> wrote:

> Actually, no I am not fundamentally satisfied. I was trying to explain  
> how the current situation came to be in reply to your assertion that  
> \u201csome idiocy\u201d was responsible and in the context of your specific  
> complaint about to text indexing.
>
> In general property functions as they exist in a variety of  
> implementations all try to address a limitation of the language in that  
> we have limited ways to introduce new solutions into a query:
>
> 1 - Pattern matches
> 2 - BIND()/Project Expressions
> 3 - Aggregation
> 4 - Values
>
> 2 is limited in that you can only introduce additional columns to  
> pre-existing solutions introduced by the other forms, 3 is limited in  
> that it reduces data. 4 only permits static data
>
> What I would like to see in the language is a generalised mechanism to  
> allow inserting extensions that expand the possible solutions e.g.
>
> SELECT *
> WHERE
> {
>  ?s a <http://example> .
>  INVOKE <http://text-indexing>(?s, \u201carg1\u201d, \u201carg2\u201d) RETURNS (?o)
>  ?s ?p ?o
> }
>
> However, no such extension exists currently to my knowledge nor do I  
> have the free time to investigate the potential ways to implement such a  
> solution. If no such extensions come into existence then there is very  
> little chance that they would make their way into future standards. So I  
> can complain about this all I want but it won\u2019t change anything.
>
> On the other hand, text indexing which is by now a widely supported  
> extension will likely be a prime candidate for future standardisation
>
>  There are other limitations in the language that have been discussed on  
> these lists in the past e.g. Supporting custom aggregations. Why doesn\u2019t  
> the language supports standard deviation as a standard aggregate?  
> Ultimately a working group has limited time and limited scope, not  
> everything that everybody wants present in the language Will make it  
> into the standard. That is why we have vendor specific extensions  
> despite all the other interoperability problems that those create for  
> myself and other users.
>
> I would reiterate the point I often make when people ask why X cannot  
> achieve Y:
>
> A tool is designed for a specific set of jobs, it is not designed to  
> solve every possible problem!  Don\u2019t forget that you are a programmer  
> and that you have a general-purpose programming language at your  
> disposal.  You can use this to achieve Solutions to many more problems  
> than your tool alone provides for.
>
> Rob
>
> On 24/04/2017 12:30, "baran.ha@gmail.com" <ba...@gmail.com> wrote:
>
>     Where SPARQL is now relating to text-indexing, this is  
> 'fundamentally' not
>     acceptable for me. And you seem to be 'fundamentally' satisfied...
>
>
>
>


-- 
Using Opera's mail client: http://www.opera.com/mail/

Re: Jena native store indexes

Posted by ba...@gmail.com.

On Tue, 25 Apr 2017 12:37:42 +0200, Rob Vesse <rv...@dotnetrdf.org> wrote:

> Actually, no I am not fundamentally satisfied. I was trying to explain  
> how the current situation came to be in reply to your assertion that  
> \u201csome idiocy\u201d was responsible and in the context of your specific  
> complaint about to text indexing.
>
> In general property functions as they exist in a variety of  
> implementations all try to address a limitation of the language in that  
> we have limited ways to introduce new solutions into a query:
>
> 1 - Pattern matches
> 2 - BIND()/Project Expressions
> 3 - Aggregation
> 4 - Values
>
> 2 is limited in that you can only introduce additional columns to  
> pre-existing solutions introduced by the other forms, 3 is limited in  
> that it reduces data. 4 only permits static data
>
> What I would like to see in the language is a generalised mechanism to  
> allow inserting extensions that expand the possible solutions e.g.
>
> SELECT *
> WHERE
> {
>  ?s a <http://example> .
>  INVOKE <http://text-indexing>(?s, \u201carg1\u201d, \u201carg2\u201d) RETURNS (?o)
>  ?s ?p ?o
> }
>
> However, no such extension exists currently to my knowledge nor do I  
> have the free time to investigate the potential ways to implement such a  
> solution. If no such extensions come into existence then there is very  
> little chance that they would make their way into future standards. So I  
> can complain about this all I want but it won\u2019t change anything.
>
> On the other hand, text indexing which is by now a widely supported  
> extension will likely be a prime candidate for future standardisation
>
>  There are other limitations in the language that have been discussed on  
> these lists in the past e.g. Supporting custom aggregations. Why doesn\u2019t  
> the language supports standard deviation as a standard aggregate?  
> Ultimately a working group has limited time and limited scope, not  
> everything that everybody wants present in the language Will make it  
> into the standard. That is why we have vendor specific extensions  
> despite all the other interoperability problems that those create for  
> myself and other users.
>
> I would reiterate the point I often make when people ask why X cannot  
> achieve Y:
>
> A tool is designed for a specific set of jobs, it is not designed to  
> solve every possible problem!  Don\u2019t forget that you are a programmer  
> and that you have a general-purpose programming language at your  
> disposal.  You can use this to achieve Solutions to many more problems  
> than your tool alone provides for.
>
> Rob
>
> On 24/04/2017 12:30, "baran.ha@gmail.com" <ba...@gmail.com> wrote:
>
>     Where SPARQL is now relating to text-indexing, this is  
> 'fundamentally' not
>     acceptable for me. And you seem to be 'fundamentally' satisfied...
>
>
>
>


-- 
Using Opera's mail client: http://www.opera.com/mail/

Re: Jena native store indexes

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Actually, no I am not fundamentally satisfied. I was trying to explain how the current situation came to be in reply to your assertion that “some idiocy” was responsible and in the context of your specific complaint about to text indexing.

In general property functions as they exist in a variety of implementations all try to address a limitation of the language in that we have limited ways to introduce new solutions into a query:

1 - Pattern matches
2 - BIND()/Project Expressions
3 - Aggregation
4 - Values

2 is limited in that you can only introduce additional columns to pre-existing solutions introduced by the other forms, 3 is limited in that it reduces data. 4 only permits static data

What I would like to see in the language is a generalised mechanism to allow inserting extensions that expand the possible solutions e.g.

SELECT *
WHERE
{
?s a <http://example> .
INVOKE <http://text-indexing>(?s, “arg1”, “arg2”) RETURNS (?o)
?s ?p ?o
}

However, no such extension exists currently to my knowledge nor do I have the free time to investigate the potential ways to implement such a solution. If no such extensions come into existence then there is very little chance that they would make their way into future standards. So I can complain about this all I want but it won’t change anything.

On the other hand, text indexing which is by now a widely supported extension will likely be a prime candidate for future standardisation

There are other limitations in the language that have been discussed on these lists in the past e.g. Supporting custom aggregations. Why doesn’t the language supports standard deviation as a standard aggregate? Ultimately a working group has limited time and limited scope, not everything that everybody wants present in the language Will make it into the standard. That is why we have vendor specific extensions despite all the other interoperability problems that those create for myself and other users.

I would reiterate the point I often make when people ask why X cannot achieve Y:

A tool is designed for a specific set of jobs, it is not designed to solve every possible problem! Don’t forget that you are a programmer and that you have a general-purpose programming language at your disposal. You can use this to achieve Solutions to many more problems than your tool alone provides for.

Rob

On 24/04/2017 12:30, "baran.ha@gmail.com" <ba...@gmail.com> wrote:

Where SPARQL is now relating to text-indexing, this is 'fundamentally' not
acceptable for me. And you seem to be 'fundamentally' satisfied...

Re: Jena native store indexes

Posted by ba...@gmail.com.

Hello,

> You seem to fundamentally misunderstand how the standardisation process  
> works.

The point is not whether i understand standardisation or not, the point is  
your argument

> ....  At the time that SPARQL 1.1 was standardised indexing was not a  
> widely used extension so there was no impetus to standardise it.

No supply, no demand. The torture creating for each property text-indexing  
out of SPARQL syntax and than beeing even not compatible to other SPARQL  
implementations yields no statistical statement whether text-indexing has  
been widely used or not.

In my posting i pointed up, text-indexing should have had top priority  
starting from scratch to develope a query language for Semantic Web  
environment, you don't think so and this has nothing to do with  
'fundamental' knowledge of a user, this has something to do setting  
different priorities.

Where SPARQL is now relating to text-indexing, this is 'fundamentally' not  
acceptable for me. And you seem to be 'fundamentally' satisfied...

baran

*************

  One might imagine that a future round of standardisation
> would choose to consider this as one candidate for a new feature in a  
> future  Version of the standard.
>
> Rob
>
> On 22/04/2017 11:02, "baran.ha@gmail.com" <ba...@gmail.com> wrote:
>
>     ...(text search with text-indexing) cannot be offically expressed in
>     SPARQL.
>    I don't think Jena Development was responsible for this, but i assume  
> they
>     know who and i as a user want also know who is in the history of  
> SPARQL
>     development responsible for this idiocy...
>
>
>
>

-- 
Using Opera's mail client: http://www.opera.com/mail/

Re: Jena native store indexes

Posted by Andy Seaborne <an...@apache.org>.

On 24/04/17 10:57, Rob Vesse wrote:
> You seem to fundamentally misunderstand how the standardisation
> process works. The intent of a standard is never to specify every
> feature that exists or that could exist but rather to specify a set
> of standard functionality that will be useful to end users while also
> being amenable to multiple interoperable implementations.
>
> For a technology like text indexing where there is a huge variety of
> approaches standardising would be hugely difficult. For example if
> you pick a particular technology e.g. Lucene then you automatically
> exclude any implementations in languages/environments where Lucene is
> not usable. If you specify a behaviour then you potentially create a
> huge burden for implementers in trying to make disparate underlying
> Technologies produce a specific set of answers is for a specific set
> of standardised test cases that may be of little relation to
> real-world use cases.

Indeed, all those issues.

For SPARQL 1.1, text indexing was discussed as a possible work item 
(consult the email archives) but the task was huge. There was no 
standard text search language (unlike regex, which are defined by XQuery).

No one volunteered to do the work.

Typical WG lifecycle - reasonable number of people at the start when 
defining the work program, fewer to do the work, fewer still to complete 
the work, respond to comments, etc.

     Andy

>
> Additionally each round of standardisation takes input based upon
> commonly used extensions in the real-world as input and works to
> standardise those.  At the time that SPARQL 1.1 was standardised
> indexing was not a widely used extension so there was no impetus to
> standardise it. One might imagine that a future round of
> standardisation would choose to consider this as one candidate for a
> new feature in a future  Version of the standard.
>
> Rob
>
> On 22/04/2017 11:02, "baran.ha@gmail.com" <ba...@gmail.com>
> wrote:
>
> ...(text search with text-indexing) cannot be offically expressed in
>  SPARQL.
>
> I don't think Jena Development was responsible for this, but i assume
> they know who and i as a user want also know who is in the history of
> SPARQL development responsible for this idiocy...
>
>
>
>

Re: Jena native store indexes

Posted by Rob Vesse <rv...@dotnetrdf.org>.

You seem to fundamentally misunderstand how the standardisation process works. The intent of a standard is never to specify every feature that exists or that could exist but rather to specify a set of standard functionality that will be useful to end users while also being amenable to multiple interoperable implementations.

For a technology like text indexing where there is a huge variety of approaches standardising would be hugely difficult. For example if you pick a particular technology e.g. Lucene then you automatically exclude any implementations in languages/environments where Lucene is not usable. If you specify a behaviour then you potentially create a huge burden for implementers in trying to make disparate underlying Technologies produce a specific set of answers is for a specific set of standardised test cases that may be of little relation to real-world use cases.

Additionally each round of standardisation takes input based upon commonly used extensions in the real-world as input and works to standardise those. At the time that SPARQL 1.1 was standardised indexing was not a widely used extension so there was no impetus to standardise it. One might imagine that a future round of standardisation would choose to consider this as one candidate for a new feature in a future Version of the standard.

Rob

On 22/04/2017 11:02, "baran.ha@gmail.com" <ba...@gmail.com> wrote:

...(text search with text-indexing) cannot be offically expressed in
SPARQL.

I don't think Jena Development was responsible for this, but i assume they
know who and i as a user want also know who is in the history of SPARQL
development responsible for this idiocy...

Re: Jena native store indexes

Posted by ba...@gmail.com.

On Wed, 12 Apr 2017 15:01:34 +0200, Rob Vesse <rv...@dotnetrdf.org> wrote:

> .....
> In the RDF world it may still be useful to create secondary indexes as  
> others have noted for certain kinds of specialised search that cannot be  
> officially expressed in SPARQL.

Here is primarily text indexing meant, i assume.

But alone the object literals of my rdfs:label's are definitly not  
'secondary' indexing, i know what a performance jump it makes and i think  
text-indexing for 'all' corresponding properties must have 'top-priority'  
in Semantic Web query-issues guessing from my experience with querying  
clients.

And from the statement above i can easily reason:

...(text search with text-indexing) cannot be offically expressed in  
SPARQL.

I don't think Jena Development was responsible for this, but i assume they  
know who and i as a user want also know who is in the history of SPARQL  
development responsible for this idiocy...

baran

-- 
Using Opera's mail client: http://www.opera.com/mail/

Re: Jena native store indexes

Posted by Rob Vesse <rv...@dotnetrdf.org>.

A RDF store is basically a four column database so and implementation can automatically construct the necessary indexes to be able to Service any simple scan i.e. Basic graph pattern. Efficient answering of queries can be done by having a sufficiently smart optimiser and using precomputed statistics about the data to perform the index scans and joins in the most efficient order.

This is very different from the relational databases which have to deal with arbitrarily structured tables and which typically only index on the primary and foreign keys of tables by default. Therefore in the relational world it is common to define your own custom indexes based on how your application accesses the data e.g. A persons name is unlikely to be a primary key but is often used to search the database.

In the RDF world it may still be useful to create secondary indexes as others have noted for certain kinds of specialised search that cannot be officially expressed in SPARQL.

On 11/04/2017 18:30, "Laura Morales" <la...@mail.com> wrote:

    But is Jena (or any RDF store for what matters) expected to perform well even if I don't explicitly add any index?

    > You 'can' create text-indexes for selected properties of your data for
    > text search with a much better performance:
    > 
    > https://jena.apache.org/documentation/query/text-query.html

Re: Jena native store indexes

Posted by Laura Morales <la...@mail.com>.

But is Jena (or any RDF store for what matters) expected to perform well even if I don't explicitly add any index?


> You 'can' create text-indexes for selected properties of your data for
> text search with a much better performance:
> 
> https://jena.apache.org/documentation/query/text-query.html

Re: Jena native store indexes

Posted by ba...@gmail.com.

> ....When writing SPARQL queries, should I be aware of any particular  
> index? Should I create new indexes myself (how)?

You 'can' create text-indexes for selected properties of your data for  
text search with a much better performance:

https://jena.apache.org/documentation/query/text-query.html

-- 
Using Opera's mail client: http://www.opera.com/mail/