You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Adrian Gschwend <ml...@netlabs.org> on 2023/06/14 09:47:04 UTC
State of Elastic/Open Search support in Fuseki
According to https://jena.apache.org/documentation/query/text-query.html
there was support for text search using Elastic instead of Lucene in
Fuseki at some point at least. But from what I can see it was removed
(?) in 4.x.
We have a use-case where faceted search is important and this is quite
hard in SPARQL 1.1, paging & counting is less than ideal. Either the
queries get very complex or the counts are wrong.
What was the reason for removing that code, lack of maintenance? If so,
any ideas on how much work it would be to bring this to the 4.x codebase
again? I guess it might make sense to switch to OpenSearch as well
instead with the Elastic license issues.
Anyone has or had Elastic in use in Fuseki & can share some experience?
I found some posts here and there but not much details about how the
integration worked.
regards
Adrian
Re: State of Elastic/Open Search support in Fuseki
Posted by Nicholas Car <ni...@kurrawong.net>.
We use Lucene in Fuseki 4.x quite successfully. Perhaps the removal of support for Elastic was simply that Lucene is supported and that is fine for most use cases.
Lucene support seems not to make faceting available (as recently discussed here by David in my company) so there is likely Lucene improvements that can be made.
Can you articulate what advantages you see in Elastic/OpenSearch support over Lucene?
Nick
On Wed, Jun 14, 2023 at 7:47 pm, Adrian Gschwend <[ml-ktk@netlabs.org](mailto:On Wed, Jun 14, 2023 at 7:47 pm, Adrian Gschwend <<a href=)> wrote:
> According to https://jena.apache.org/documentation/query/text-query.html
> there was support for text search using Elastic instead of Lucene in
> Fuseki at some point at least. But from what I can see it was removed
> (?) in 4.x.
>
> We have a use-case where faceted search is important and this is quite
> hard in SPARQL 1.1, paging & counting is less than ideal. Either the
> queries get very complex or the counts are wrong.
>
> What was the reason for removing that code, lack of maintenance? If so,
> any ideas on how much work it would be to bring this to the 4.x codebase
> again? I guess it might make sense to switch to OpenSearch as well
> instead with the Elastic license issues.
>
> Anyone has or had Elastic in use in Fuseki & can share some experience?
> I found some posts here and there but not much details about how the
> integration worked.
>
> regards
>
> Adrian
Re: State of Elastic/Open Search support in Fuseki
Posted by Andy Seaborne <an...@apache.org>.
On 19/06/2023 13:29, Adrian Gschwend wrote:
> On 16.06.23 21:53, Andy Seaborne wrote:
>
> Hi Andy,
>
>> From the documentation:
> ah thanks, I read it but have to wrap my head around it with some
> examples to understand what happens.
>
>> There is also the model of "One document equals one entity" model that
>> might be more appropriate faceted search. It returns the subject URI
>> with a Lucene document for multiple triples.
>
> same. That might be what I had in mind.
>
>> There then needs to be a facet property function. Would someone like
>> to sketch one out as a GH issue?
>
> I'll try to come up with some examples so we can see what would be useful.
>
>> ** ElasticSearch - if we can negotiate the licensing issues (the
>> client libs are OSS but to test them needs a server so it impacts the
>> build; there may be a testcontainers.io way round this, or optional
>> tests - we need the build to be clean as well as the produced
>> binaries), then this could be done and/or solr. It does need someone
>> or someones to take an interest in this both now and for keeping the
>> code maintained especially if any security issues arise.
>
> But the licensing issues would be solved if we switch to OpenSearch or
> am I missing something?
Probably not - while the feature set isn't identical. For Jena usage,
probably not significant.
There is a testcontainer:
https://github.com/opensearch-project/opensearch-testcontainers
Andy
> And I agree on the interest, I will think about it more and see if we
> have a case that is worth spending some time/money on that.
>
> regards
>
> Adrian
Re: State of Elastic/Open Search support in Fuseki
Posted by Adrian Gschwend <ml...@netlabs.org>.
On 16.06.23 21:53, Andy Seaborne wrote:
Hi Andy,
> From the documentation:
ah thanks, I read it but have to wrap my head around it with some
examples to understand what happens.
> There is also the model of "One document equals one entity" model that
> might be more appropriate faceted search. It returns the subject URI
> with a Lucene document for multiple triples.
same. That might be what I had in mind.
> There then needs to be a facet property function. Would someone like to
> sketch one out as a GH issue?
I'll try to come up with some examples so we can see what would be useful.
> ** ElasticSearch - if we can negotiate the licensing issues (the client
> libs are OSS but to test them needs a server so it impacts the build;
> there may be a testcontainers.io way round this, or optional tests - we
> need the build to be clean as well as the produced binaries), then this
> could be done and/or solr. It does need someone or someones to take an
> interest in this both now and for keeping the code maintained especially
> if any security issues arise.
But the licensing issues would be solved if we switch to OpenSearch or
am I missing something?
And I agree on the interest, I will think about it more and see if we
have a case that is worth spending some time/money on that.
regards
Adrian
Re: State of Elastic/Open Search support in Fuseki
Posted by Andy Seaborne <an...@apache.org>.
** Faceted search
From the documentation:
There is also the model of "One document equals one entity" model that
might be more appropriate faceted search. It returns the subject URI
with a Lucene document for multiple triples.
"""
When using this integration model, text:query returns the subject URI
for the document
"""
There then needs to be a facet property function. Would someone like to
sketch one out as a GH issue?
** ElasticSearch - if we can negotiate the licensing issues (the client
libs are OSS but to test them needs a server so it impacts the build;
there may be a testcontainers.io way round this, or optional tests - we
need the build to be clean as well as the produced binaries), then this
could be done and/or solr. It does need someone or someones to take an
interest in this both now and for keeping the code maintained especially
if any security issues arise.
Andy
On 15/06/2023 12:49, Adrian Gschwend wrote:
> On 14.06.23 14:45, Øyvind Gjesdal wrote:
>
> Hi Øyvind,
>
>
>> Facet/aggregation was not implemented as extension functions in SPARQL
>> and
>> I believe that it also used the same abstraction described in the
>> jena-text
>> docs:
>>
>>> One Jena*triple* equals one Lucene*document*
>> which makes aggregations/facets not available or usable neither from the
>> Elasticsearch APIs.
>
> yes I saw that and I also thought that's probably not ideal. I don't
> know much about Elastic in practice, I mainly read tutorials &
> documentation. What I had in mind was that we could define for example
> via SHACL shape (or something comparable) what a "document" contains. So
> it's shapes that would define how we see the document and we could use
> this abstraction for search. So the integration would take SHACL shapes,
> create a "document" out of it that is consumable by Elastic and then we
> could use this for search.
>
> The second thing is that I'm mainly interested in an integration that we
> don't have to update the Elastic index on our own. I guess that the
> Fuseki integration takes care of that so it's "in sync" all the time. I
> would want the Elastic API available as well as this is easier to use
> for the facet use-cases than pure SPARQL. Paging is not trivial in
> SPARQL for use-cases like this, the Elastic API however is built for that.
>
>> We switched to jena-text with Lucene after some weeks, which didn't have
>> aggregations either, but there was much more activity and usage for the
>> module, and the options for configuring from the assembler files were
>> much
>> richer.
>
> ok, any example of what you configure in there? I don't think I saw much
> in the documentation for that so far. Aggregations are definitely
> something I would like to have. One example are archival records, where
> we have a hierarchy in the data. And I need to be able to show that
> hierarchy per record (which has it's own IRI) and to browse by hierarchy
> levels as well. This is super easy to represent in RDF but super hard to
> query efficiently.
>
>> At the moment I'm unsure if I inspected and looked at the Elasticsearch
>> APIs directly to check the structure of the documents in the index
>> itself,
>> after indexing.
>
> What versions did you work on with Elastic?
>
> regards
>
> Adrian
>
Re: State of Elastic/Open Search support in Fuseki
Posted by Øyvind Gjesdal <oy...@gmail.com>.
Hi Adrian,
From the git history I can see that we were using Elasticsearch 6.4.3 and
Fuseki 3.10.0 when we experimented with it and had it running.
AFAIK Both Lucene and Elasticsearch indexes the data for you as the
triplestore updates, without having to do anything, using a normal
configuration.
We mostly ended up using the defaults for jena-text and Lucene, but I think
what I missed was changing the analyzer or tokenizer. but on second thought
I could have maybe used the elasticsearch settings rest endpoint?
Best regards
Øyvind
Re: State of Elastic/Open Search support in Fuseki
Posted by Adrian Gschwend <ml...@netlabs.org>.
On 14.06.23 14:45, Øyvind Gjesdal wrote:
Hi Øyvind,
> Facet/aggregation was not implemented as extension functions in SPARQL and
> I believe that it also used the same abstraction described in the jena-text
> docs:
>
>> One Jena*triple* equals one Lucene*document*
> which makes aggregations/facets not available or usable neither from the
> Elasticsearch APIs.
yes I saw that and I also thought that's probably not ideal. I don't
know much about Elastic in practice, I mainly read tutorials &
documentation. What I had in mind was that we could define for example
via SHACL shape (or something comparable) what a "document" contains. So
it's shapes that would define how we see the document and we could use
this abstraction for search. So the integration would take SHACL shapes,
create a "document" out of it that is consumable by Elastic and then we
could use this for search.
The second thing is that I'm mainly interested in an integration that we
don't have to update the Elastic index on our own. I guess that the
Fuseki integration takes care of that so it's "in sync" all the time. I
would want the Elastic API available as well as this is easier to use
for the facet use-cases than pure SPARQL. Paging is not trivial in
SPARQL for use-cases like this, the Elastic API however is built for that.
> We switched to jena-text with Lucene after some weeks, which didn't have
> aggregations either, but there was much more activity and usage for the
> module, and the options for configuring from the assembler files were much
> richer.
ok, any example of what you configure in there? I don't think I saw much
in the documentation for that so far. Aggregations are definitely
something I would like to have. One example are archival records, where
we have a hierarchy in the data. And I need to be able to show that
hierarchy per record (which has it's own IRI) and to browse by hierarchy
levels as well. This is super easy to represent in RDF but super hard to
query efficiently.
> At the moment I'm unsure if I inspected and looked at the Elasticsearch
> APIs directly to check the structure of the documents in the index itself,
> after indexing.
What versions did you work on with Elastic?
regards
Adrian
Re: State of Elastic/Open Search support in Fuseki
Posted by Øyvind Gjesdal <oy...@gmail.com>.
Hi Adrian,
We tried the elastic-search module when it was available and had your
use-case in mind with facets. But as far as I remember I don't think it was
possible to use aggregations (at least from the sparql side of things).
I understood the elasticsearch-module as an alternative to the lucene
module, used in a similar manner, using the same extension function
(text:query).
Facet/aggregation was not implemented as extension functions in SPARQL and
I believe that it also used the same abstraction described in the jena-text
docs:
> One Jena *triple* equals one Lucene *document*
which makes aggregations/facets not available or usable neither from the
Elasticsearch APIs.
We switched to jena-text with Lucene after some weeks, which didn't have
aggregations either, but there was much more activity and usage for the
module, and the options for configuring from the assembler files were much
richer.
Just a disclaimer that this was a long time ago, and I could have
misunderstood how things worked.
At the moment I'm unsure if I inspected and looked at the Elasticsearch
APIs directly to check the structure of the documents in the index itself,
after indexing.
Best regards,
Øyvind
On Wed, Jun 14, 2023 at 11:48 AM Adrian Gschwend <ml...@netlabs.org> wrote:
>
> According to https://jena.apache.org/documentation/query/text-query.html
> there was support for text search using Elastic instead of Lucene in
> Fuseki at some point at least. But from what I can see it was removed
> (?) in 4.x.
>
> We have a use-case where faceted search is important and this is quite
> hard in SPARQL 1.1, paging & counting is less than ideal. Either the
> queries get very complex or the counts are wrong.
>
> What was the reason for removing that code, lack of maintenance? If so,
> any ideas on how much work it would be to bring this to the 4.x codebase
> again? I guess it might make sense to switch to OpenSearch as well
> instead with the Elastic license issues.
>
> Anyone has or had Elastic in use in Fuseki & can share some experience?
> I found some posts here and there but not much details about how the
> integration worked.
>
> regards
>
> Adrian
>
>