You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2017/03/04 17:41:46 UTC

[jira] [Commented] (JENA-1305) Elastic Search Support for Apache Jena Text

    [ https://issues.apache.org/jira/browse/JENA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895788#comment-15895788 ] 

Osma Suominen commented on JENA-1305:
-------------------------------------

This sounds great! From my perspective, even a smaller feature set would be acceptable, as long as basic text indexing functionality works.

One important thing is to have unit tests from the start. Luckily ES seems to provide good support for that in the form of a [testing framework|https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html]. I hope you can make use of that (or something similar).

I hope you can make use of the existing jena-text Lucene code (and possibly the Solr code as well if it helps). In fact, I strongly suggest that you avoid duplicating code if at all possible, and instead try to implement the ES side so that it shares as much code as possible with the Lucene support. This may require some refactoring of existing code; I'm willing to help with that.

Also I hope that you can make use of the existing Lucene unit tests. In my mind, the unit tests that test a specific feature (say, deleting indexed values) should be the same regardless of which backend (Lucene/ES) is being used. This may require some reengineering of the test classes so that their functionality and naming can become backend-independent. The inheritance hierarchy is already quite convoluted though, and I'm partially responsible for that. I can help with the tests as well.

You can base your implementation on this branch:
https://github.com/osma/jena/tree/jena-1301-drop-solr
i.e. my branch which contains the Lucene 6 upgrade (JENA-1250/PR #219) as well as dropping of Solr support (JENA-1301/PR #220). I expect to merge these to Jena master soon, I just want to give people a chance to comment and perhaps do some additional testing as well before merging.

Just a reminder: When the code is done, the [jena-text documentation](https://jena.apache.org/documentation/query/text-query.html) needs to be updated as well. Also there should be example configuration files for jena-text with ES alongside the jena-text/Lucene examples.

> Elastic Search Support for Apache Jena Text 
> --------------------------------------------
>
>                 Key: JENA-1305
>                 URL: https://issues.apache.org/jira/browse/JENA-1305
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: Text
>    Affects Versions: Jena 3.2.0
>            Reporter: Anuj Kumar
>            Assignee: Osma Suominen
>              Labels: elasticsearch
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> This Jira tracks the development of Jena Text ElasticSearch Implementation.
> The goal is to extend Jena Text capability to index, at scale, in ElasticSearch. This implementation would be similar to the Lucene and Solr implementations.
> We will use ES version 5.2.1 for the implementation.
> The following functionalities would be supported:
> * Indexing Literal values
> * Updating indexed values
> * Deleting Indexed values
> * Custom Analyzer Support
> * Configuration using Assembler as well as Java techniques.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)