You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Code Ferret (JIRA)" <ji...@apache.org> on 2019/06/18 17:48:00 UTC

[jira] [Created] (JENA-1723) jena:text create OR's of Lucene fields

Code Ferret created JENA-1723:
---------------------------------

             Summary: jena:text create OR's of Lucene fields
                 Key: JENA-1723
                 URL: https://issues.apache.org/jira/browse/JENA-1723
             Project: Apache Jena
          Issue Type: New Feature
          Components: Jena
    Affects Versions: Jena 3.13.0
            Reporter: Code Ferret
            Assignee: Code Ferret


h3. Motivation:

With the current {{jena:text}} we often find that we have query patterns such as:
{code}
select ?foo where {
  {
     (?s ?sc ?lit) text:query ( rdfs:label "some query" "highlight:" ).
  }
  union
  {
    (?s ?sc ?lit) text:query ( skos:altLabel "some query" "highlight:" ).
  }
  union
  { 
    (?s ?sc ?lit) text:query ( skos:prefLabel "some query" "highlight:").
  }
}
{code}
For various sets of RDF properties, each corresponding to some Lucene field.

It can be more performant to _push_ the {{unions}} into the Lucene query by rewriting as:
{code}
(altLabel:"some query" OR prefLabel:"some query" OR label:"some query")
{code}
Then it's a single query with Lucene performing the {{unions}}.

h3. Approach:

We've implemented this by 

1. adding a new assembler feature in {{text:TextIndexLucene}}:
{code}
[] text:props (
    text:propList [ text:propListProp  ex:labels ;
         text:props ( skos:prefLabel skos:altLabel rdfs:label ) ]
} ;
{code}
Which allows to give a single _Property_ id, e.g., {{ex:labels}}, to a list of properties.

and

2. adding some syntax to the {{TextQueryPF}}:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels "some query" "highlight:" )
{code}
The addition of the fifth output arg, {{?prop}}, allows to return the specific property that matched and if the input args includes {{text:props}} as the first argument then there must be a list, of at least one, properties prior to the query string. These properties are either the usual Lucene indexed properties that occur in {{text:query}} or a property list property such as {{ex:labels}} above.

When a list property is encountered it is expanded to the underlying list of indexed properties from the configuration.

There may be any mix of indexed and property list properties following {{text:props}} in the input arg list:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels rdfs:comment "some query" "highlight:" )
{code}
which searches over the three properties listed in {{ex:labels}} and the property {{rdfs:comment}}.

This functionality is implemented, including copious tests, and a PR can be issued after a bit of code cleanup.

h3. Discussion:

The use of {{text:props}} in the query form isn't strictly necessary, and was introduced as a way of indicating the intent to have a list of properties to be searched over. 

If the {{text:props}} _flag_ is removed from the implementation then the feature will simply check the property(s) for whether they are list properties or just indexed properties.

With this modification the above queries would be written simply as:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
{code}
or
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels rdfs:comment "some query" "highlight:" )
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)