You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by David Habgood <dc...@gmail.com> on 2023/02/13 12:59:15 UTC

Lucene Faceted search

Hi Jena Users,

I'm interested in extending the Jena Lucene capabilities to include
Lucene's faceted search (
https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html).

As far as I can tell from searching the mailing list (and github) the
Lucene faceted search capability hasn't been exposed in Jena before.

I think it could be exposed as follows:
1. Defining how faceted search concepts can be expressed in the Jena
dataset configuration
2. Extending the current indexing code to also generate the facet index
based on definitions in 1.
3. Adding a new query function for faceted search e.g. text:facet

Keen to hear if anyone can see issues with this approach or has other
feedback.

Thanks
David

Re: Lucene Faceted search

Posted by David Habgood <dc...@gmail.com>.
I'll have a play with the Lucene functionality and see if I can come up
with a more detailed model of how it could work. I can see there's options
around "hierarchical", "taxonomy index" etc and would like to understand
these better

On Thu, Feb 16, 2023 at 7:47 AM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 14/02/2023 09:10, Øyvind Gjesdal wrote:
> > Hi,
> >
> > This is also something I've thought about, since we have a dated
> > elasticsearch integration for creating facets from endpoints, and we use
> > aggregated sparql queries for counting which sometimes becomes slow-ish,
> > and has to be turned off for larger datasets.
> >
> > An idea I had around 3 in how it could look, was maybe to extend the the
> > text query syntax with one named variable for facets, which could also
> > contain the results
> > Using the example from from the jena-text documentation:
> >
> > (?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets"
> > optionally to the possible syntax.
>
> Would it be better to introduce text:facetQuery which has inputs and
> outputs specifically for facetted search? An all-purpose property
> function may get unwieldy and user-error prone.
>
> Yes I think so

> The other choice staying within SPARQL 1.1 syntax is SERVICE --
> https://jena.apache.org/documentation/query/service_enhancer.html
> which in effect gives named arguments.
>
> Syntax outside of SPARQL 1.1 syntax is also possible. Having text
> search/faceted search have it's own syntax (the same underlying
> machinery) seems reasonable given how important it is.
>
> > I don't know what the type of the list ?facets (categories and counts)
> > should be, I initially thought it would be nice to have as json, but see
> > that one graph database implements facet results as blank nodes.
>

Which database was this? For my use cases I'd prefer RDF over a micro
format, as there's likely to be subsequent queries based on the results of
the first, and RDF would be easier to parse

> > An option could be just adding an additional parsable string to the
> > text:query extension function, but it is kind of already rich, so I think
> > text:facet is a good idea to not bloat the text:query.
> >
> > There are probably multiple use-cases there as well, such as range,
> > multiple values on same facet, so this idea may end up looking a bit
> > hackish:
> >
> > ?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy'
> > 'facets: facet1: "value1", facet1: "value2"; facet2 : ...')
> >
> > I'm very happy to see others also interested in this use case.
> >
> > Best regards,
> > Øyvind
> >
> > On Tue, Feb 14, 2023 at 6:52 AM David Habgood <dc...@gmail.com>
> wrote:
> >
> >> Thanks for the link Andy,
> >>
> >> @Elie my specific use case is this:
> >>
> >> I have millions of records with perhaps 100 unique attributes across the
> >> records. Individual records may only have 5-10 attributes though. So a
> user
> >> who wishes to browse the data based on attributes can progressively
> filter
> >> the data to find individual or groups of records. When a user selects a
> >> facet, only those (additional) facets for which records exist are
> displayed
> >> as options, along with counts.
> >>
> >> It is possible with regular SPARQL GROUP BY and COUNT queries but not so
> >> performant.
> >>
> >> Cheers
> >>
> >> On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <an...@apache.org> wrote:
> >>
> >>>
> >>>
> >>> On 13/02/2023 12:59, David Habgood wrote:
> >>>> Hi Jena Users,
> >>>>
> >>>> I'm interested in extending the Jena Lucene capabilities to include
> >>>> Lucene's faceted search (
> >>>>
> >> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
> >>> ).
> >>>
> >>>
> >>>
> >>
> https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html
> >>>
> >>>
> >>>>
> >>>> As far as I can tell from searching the mailing list (and github) the
> >>>> Lucene faceted search capability hasn't been exposed in Jena before.
> >>>>
> >>>> I think it could be exposed as follows:
> >>>> 1. Defining how faceted search concepts can be expressed in the Jena
> >>>> dataset configuration
> >>>> 2. Extending the current indexing code to also generate the facet
> index
> >>>> based on definitions in 1.
> >>>> 3. Adding a new query function for faceted search e.g. text:facet
> >>>>
> >>>> Keen to hear if anyone can see issues with this approach or has other
> >>>> feedback.
> >>>>
> >>>> Thanks
> >>>> David
> >>>>
> >>>
> >>
> >
>

Re: Lucene Faceted search

Posted by Andy Seaborne <an...@apache.org>.

On 14/02/2023 09:10, Øyvind Gjesdal wrote:
> Hi,
> 
> This is also something I've thought about, since we have a dated
> elasticsearch integration for creating facets from endpoints, and we use
> aggregated sparql queries for counting which sometimes becomes slow-ish,
> and has to be turned off for larger datasets.
> 
> An idea I had around 3 in how it could look, was maybe to extend the the
> text query syntax with one named variable for facets, which could also
> contain the results
> Using the example from from the jena-text documentation:
> 
> (?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets"
> optionally to the possible syntax.

Would it be better to introduce text:facetQuery which has inputs and 
outputs specifically for facetted search? An all-purpose property 
function may get unwieldy and user-error prone.

The other choice staying within SPARQL 1.1 syntax is SERVICE --
https://jena.apache.org/documentation/query/service_enhancer.html
which in effect gives named arguments.

Syntax outside of SPARQL 1.1 syntax is also possible. Having text 
search/faceted search have it's own syntax (the same underlying 
machinery) seems reasonable given how important it is.

> I don't know what the type of the list ?facets (categories and counts)
> should be, I initially thought it would be nice to have as json, but see
> that one graph database implements facet results as blank nodes.
> An option could be just adding an additional parsable string to the
> text:query extension function, but it is kind of already rich, so I think
> text:facet is a good idea to not bloat the text:query.
> 
> There are probably multiple use-cases there as well, such as range,
> multiple values on same facet, so this idea may end up looking a bit
> hackish:
> 
> ?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy'
> 'facets: facet1: "value1", facet1: "value2"; facet2 : ...')
> 
> I'm very happy to see others also interested in this use case.
> 
> Best regards,
> Øyvind
> 
> On Tue, Feb 14, 2023 at 6:52 AM David Habgood <dc...@gmail.com> wrote:
> 
>> Thanks for the link Andy,
>>
>> @Elie my specific use case is this:
>>
>> I have millions of records with perhaps 100 unique attributes across the
>> records. Individual records may only have 5-10 attributes though. So a user
>> who wishes to browse the data based on attributes can progressively filter
>> the data to find individual or groups of records. When a user selects a
>> facet, only those (additional) facets for which records exist are displayed
>> as options, along with counts.
>>
>> It is possible with regular SPARQL GROUP BY and COUNT queries but not so
>> performant.
>>
>> Cheers
>>
>> On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <an...@apache.org> wrote:
>>
>>>
>>>
>>> On 13/02/2023 12:59, David Habgood wrote:
>>>> Hi Jena Users,
>>>>
>>>> I'm interested in extending the Jena Lucene capabilities to include
>>>> Lucene's faceted search (
>>>>
>> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
>>> ).
>>>
>>>
>>>
>> https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html
>>>
>>>
>>>>
>>>> As far as I can tell from searching the mailing list (and github) the
>>>> Lucene faceted search capability hasn't been exposed in Jena before.
>>>>
>>>> I think it could be exposed as follows:
>>>> 1. Defining how faceted search concepts can be expressed in the Jena
>>>> dataset configuration
>>>> 2. Extending the current indexing code to also generate the facet index
>>>> based on definitions in 1.
>>>> 3. Adding a new query function for faceted search e.g. text:facet
>>>>
>>>> Keen to hear if anyone can see issues with this approach or has other
>>>> feedback.
>>>>
>>>> Thanks
>>>> David
>>>>
>>>
>>
> 

Re: Lucene Faceted search

Posted by Øyvind Gjesdal <oy...@gmail.com>.
Hi,

This is also something I've thought about, since we have a dated
elasticsearch integration for creating facets from endpoints, and we use
aggregated sparql queries for counting which sometimes becomes slow-ish,
and has to be turned off for larger datasets.

An idea I had around 3 in how it could look, was maybe to extend the the
text query syntax with one named variable for facets, which could also
contain the results
Using the example from from the jena-text documentation:

(?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets"
optionally to the possible syntax.

I don't know what the type of the list ?facets (categories and counts)
should be, I initially thought it would be nice to have as json, but see
that one graph database implements facet results as blank nodes.
An option could be just adding an additional parsable string to the
text:query extension function, but it is kind of already rich, so I think
text:facet is a good idea to not bloat the text:query.

There are probably multiple use-cases there as well, such as range,
multiple values on same facet, so this idea may end up looking a bit
hackish:

?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy'
'facets: facet1: "value1", facet1: "value2"; facet2 : ...')

I'm very happy to see others also interested in this use case.

Best regards,
Øyvind

On Tue, Feb 14, 2023 at 6:52 AM David Habgood <dc...@gmail.com> wrote:

> Thanks for the link Andy,
>
> @Elie my specific use case is this:
>
> I have millions of records with perhaps 100 unique attributes across the
> records. Individual records may only have 5-10 attributes though. So a user
> who wishes to browse the data based on attributes can progressively filter
> the data to find individual or groups of records. When a user selects a
> facet, only those (additional) facets for which records exist are displayed
> as options, along with counts.
>
> It is possible with regular SPARQL GROUP BY and COUNT queries but not so
> performant.
>
> Cheers
>
> On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <an...@apache.org> wrote:
>
> >
> >
> > On 13/02/2023 12:59, David Habgood wrote:
> > > Hi Jena Users,
> > >
> > > I'm interested in extending the Jena Lucene capabilities to include
> > > Lucene's faceted search (
> > >
> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
> > ).
> >
> >
> >
> https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html
> >
> >
> > >
> > > As far as I can tell from searching the mailing list (and github) the
> > > Lucene faceted search capability hasn't been exposed in Jena before.
> > >
> > > I think it could be exposed as follows:
> > > 1. Defining how faceted search concepts can be expressed in the Jena
> > > dataset configuration
> > > 2. Extending the current indexing code to also generate the facet index
> > > based on definitions in 1.
> > > 3. Adding a new query function for faceted search e.g. text:facet
> > >
> > > Keen to hear if anyone can see issues with this approach or has other
> > > feedback.
> > >
> > > Thanks
> > > David
> > >
> >
>

Re: Lucene Faceted search

Posted by David Habgood <dc...@gmail.com>.
Thanks for the link Andy,

@Elie my specific use case is this:

I have millions of records with perhaps 100 unique attributes across the
records. Individual records may only have 5-10 attributes though. So a user
who wishes to browse the data based on attributes can progressively filter
the data to find individual or groups of records. When a user selects a
facet, only those (additional) facets for which records exist are displayed
as options, along with counts.

It is possible with regular SPARQL GROUP BY and COUNT queries but not so
performant.

Cheers

On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 13/02/2023 12:59, David Habgood wrote:
> > Hi Jena Users,
> >
> > I'm interested in extending the Jena Lucene capabilities to include
> > Lucene's faceted search (
> > https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
> ).
>
>
> https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html
>
>
> >
> > As far as I can tell from searching the mailing list (and github) the
> > Lucene faceted search capability hasn't been exposed in Jena before.
> >
> > I think it could be exposed as follows:
> > 1. Defining how faceted search concepts can be expressed in the Jena
> > dataset configuration
> > 2. Extending the current indexing code to also generate the facet index
> > based on definitions in 1.
> > 3. Adding a new query function for faceted search e.g. text:facet
> >
> > Keen to hear if anyone can see issues with this approach or has other
> > feedback.
> >
> > Thanks
> > David
> >
>

Re: Lucene Faceted search

Posted by Andy Seaborne <an...@apache.org>.

On 13/02/2023 12:59, David Habgood wrote:
> Hi Jena Users,
> 
> I'm interested in extending the Jena Lucene capabilities to include
> Lucene's faceted search (
> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html).

https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html


> 
> As far as I can tell from searching the mailing list (and github) the
> Lucene faceted search capability hasn't been exposed in Jena before.
> 
> I think it could be exposed as follows:
> 1. Defining how faceted search concepts can be expressed in the Jena
> dataset configuration
> 2. Extending the current indexing code to also generate the facet index
> based on definitions in 1.
> 3. Adding a new query function for faceted search e.g. text:facet
> 
> Keen to hear if anyone can see issues with this approach or has other
> feedback.
> 
> Thanks
> David
> 

Re: Lucene Faceted search

Posted by Élie Roux <ro...@gmail.com>.
Dear David,

I think extended Jena Lucene is a good idea, but I'm not exactly sure
what you mean (partly because I'm not very familiar with Lucene's
faceted search). Can you give an example?

Best,
-- 
Elie