You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2017/11/06 10:38:00 UTC

[jira] [Commented] (JENA-1388) Lucene text search across multiple fields ("AND") yields no results

    [ https://issues.apache.org/jira/browse/JENA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240123#comment-16240123 ] 

Osma Suominen commented on JENA-1388:
-------------------------------------

What [~andy.seaborne] says is correct. The way jena-text with the Lucene backend works is that it creates separate documents for each document. So one triple (or quad, when graph-specific indexing is enabled) corresponds to one document in the Lucene index. The upside is that this makes it rather simple to synchronize updates between the triple store and the Lucene index: for new triples, add documents into Lucene; for deleted triples, delete the corresponding documents from Lucene. The downside is that AND queries cannot be supported. This is a pretty fundamental design choice in jena-text so it cannot be simply fixed like a normal bug. It would require reengineering significant parts of the jena-text subsystem.

Note that the recently added Elasticsearch backend for jena-text works differently: it consolidates triples with the same subject into a single document in the text index. But it has to do a lot of bookkeeping to keep the information synchronized. One consequence of this is that updates to the index are very slow compared with the Lucene backend (though a major factor in this is also that operations are performed via a REST API to the Elasticsearch server, whereas the Lucene backend lives in the same JVM). The Elasticsearch backend does support AND queries, so you may want to try it instead of using the Lucene backend.

> Lucene text search across multiple fields ("AND") yields no results
> -------------------------------------------------------------------
>
>                 Key: JENA-1388
>                 URL: https://issues.apache.org/jira/browse/JENA-1388
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.4.0
>         Environment: CentOS 7.3, OpenJDK 64-Bit, v1.8.0_141-b16
>            Reporter: Vilnis Termanis (Iotic Labs)
>              Labels: index, lucene, search
>         Attachments: config-fields.ttl, multi_field.ttl, multi_index.sparql
>
>
> Searching across two Lucene text indexed fields produces potentially unexpected results. (The following assumes that the string supplied to each field does match and is tied to the same uid/subject.)
> # A query across two fields with *OR* produces two equal rows
> # The same query but with *AND* produces no rows



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)