You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Osma Suominen <os...@helsinki.fi> on 2017/03/01 06:27:33 UTC

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Hi Anuj,

I understand your concerns. However, we also need to balance between the 
needs of individual modules/features and the whole codebase. I'm willing 
to put in the effort to keep the other modules up to date with newer 
Lucene versions. Lucene upgrade requirements are well documented, the 
only hitches seen in JENA-1250 were related to how jena-text (ab)used 
some Lucene features that were dropped from newer versions.

A perhaps stupid question to more experienced Java developers: is it 
even possible to mix modules that depend on different versions of the 
Lucene libraries within the same project? In my (quite limited) 
understanding of Java projects and libraries, this requires special 
arrangements (e.g. shading) as the Java package/class namespace is 
shared by all the code running within the same JVM.

So can you create, say, a Fuseki build that contains the current 
jena-text module (depending on Lucene 4.x) and the new jena-text-es 
module (depending on Lucene 6.4.1) without any compatibility issues?

-Osma



01.03.2017, 00:47, anuj kumar kirjoitti:
> Hi,
>
> My 2 Cents :
>
>  The reason I proposed to have separate modules for Lucene, Solr and ES is
> exactly for avoiding the "All or Nothing" approach we need to take if we
> club them all together. If they stay together and if in the near future I
> want to upgrade ES to another version, I also need to again upgrade Lucene
> and Solr and possibly another implementation that may have been added
> during the time. As we all know, this means weeks of work if not months to
> get the changes released. This will personally de-motivate me to do
> anything and I will probably start maintaining my version of Jena-Text as
> that would be much simpler to do than to upgrade and test and in the
> process own(read fix bugs) the upgrade for each and every technology.
>
> If they are developed as separate modules, they can evolve independently of
> each other and we can avoid situations where we cant upgrade to latest
> version of Lucene because we do not know what effect it will have on Solr
> Implementation.
>
> We can start with having a separate Module for Jena Text ES and see how
> things go. If they go well, we could extract out Solr and Lucene out of
> Jena Text.
>
> Again this is just a suggestion based on my limited industry experience.
>
> Thanks,
> Anuj Kumar
>
>
>
> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>
>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E
>>> ? In other words, might it be better to factor out between -text and
>>> -spatial and _then_ try to upgrade the Lucene version?
>>>
>>
>> I certainly wouldn't object to that, but somebody has to volunteer to do
>> the actual work!
>>
>> I don't use the Solr component now, but I could easily see so doing...
>>> that's pretty vague, I know, and I'm not in a position to do any work to
>>> maintain it, so consider that just a very small and blurry data point. :)
>>>
>>
>> Last time I tried it (it was a while ago) I couldn't figure out how to get
>> it running... If you could just try that with some toy data, then your data
>> point would be a lot less blurry :) I haven't used Solr for anything, so
>> I'm not very familiar with how to set it up, and the jena-text instructions
>> are pretty vague unfortunately.
>>
>>
>> -Osma
>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Just FYI, I was able to index multiple fields in ElasticSearch using Jena
Text capability.
The issue was in my ElasticSearch code where I was doing insert every time
instead of an update :/

Cheers!
Anuj Kumar

On Wed, Mar 1, 2017 at 7:40 PM, anuj kumar <an...@gmail.com> wrote:

> Thanks Osma. I sent my previous email just a minute early. I will try your
> suggestion and if it doesn't work will send you the entire example.
>
> Thanks again.
> Anuj
>
> On 1 Mar 2017 19:36, "Osma Suominen" <os...@helsinki.fi> wrote:
>
>> Hi Anuj!
>>
>> Generally I use assembler descriptions to configure the jena-text index.
>> An example with multiple properties (SKOS label properties) is here:
>> https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
>> ating-a-text-index
>>
>> For examples on how to use assembler descriptions from Java code, take a
>> look at the jena-text unit tests. They generally contain a snippet of
>> assembler definition that configures the text index in a particular way,
>> then test that it does what it should when using that configuration.
>>
>> You didn't provide a full example. What is your data and what query did
>> you use? What results did you expect? What happened instead?
>>
>> One possible problem in your configuration is that you have set the
>> primary predicate to rdfs:label, but not set a field for it. Try adding
>> this:
>>
>> entDef.set("label", RDFS.label.asNode());
>>
>> For querying everything else but the default field, you need to specify
>> the predicate at query time. With your configuration, it should be possible
>> to query rdfs:comment values like this:
>>
>> ?s text:query (rdfs:comment "word") .
>>
>> Hope this helps!
>>
>> -Osma
>>
>> 01.03.2017, 17:33, anuj kumar kirjoitti:
>>
>>> BTW, I have one more question:
>>>
>>> How do I add more than one field to be indexed in my Index?
>>> Basically, if I want to index rdfs:label , rdfs:comment in the same index
>>> document, how do I do it?
>>>
>>> I tried :
>>>
>>> EntityDefinition entDef = new EntityDefinition(DOC_TYPE,
>>> FIELD_TO_SEARCH);
>>> entDef.setPrimaryPredicate(RDFS.label);
>>> entDef.setGraphField(GRAPH_FIELD_NAME);
>>> entDef.set("comment", RDFS.comment.asNode());
>>>
>>> But it doesnt work. Can you please point me on a way to do it please.
>>> This
>>> is an important piece of functionality I need.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>>
>>> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <an...@gmail.com>
>>> wrote:
>>>
>>> I personally have no preference as to how the code in Jena should be
>>>> structured, as long as I am able to use it :).
>>>> I have personal preference of doing it in a specific way because IMO, it
>>>> is modular which makes it much easier to maintain in the long run. But
>>>> again it may not be the quickest one.
>>>>
>>>> I already have been given a deadline, by the company to have ES
>>>> extension
>>>> implemented in the next 15 days :). What this means is that I will be
>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>> coming period of time. I would be more than happy to contribute to Jena
>>>> community whatever is required to have a proper ElasticSearch
>>>> Implementation in place, whether within jena-text module or as a
>>>> separate
>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>
>>>> Cheers!
>>>> Anuj Kumar
>>>>
>>>>
>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>
>>>> Osma--
>>>>>
>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>> different versions of code accessible in different ways. The longer
>>>>> answer
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>> problem,
>>>>> at least not without a lot of other change.
>>>>>
>>>>> You are right to point to the classloader mechanism as being at the
>>>>> heart
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>> a set
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>> flat package/class namespace and a set of compiled classes".
>>>>>
>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>> Each
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>> for
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>> the
>>>>> way of types to function, based on metadata that the bundles provide
>>>>> to the
>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>> enjoy
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>> investment to use. In particular, it's probably too large to put
>>>>> _inside_
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>> hand.)
>>>>>
>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>> for
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>> There might be another, more lightweight, toolkit out there to this
>>>>> purpose, but I'm not aware of any myself.
>>>>>
>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>> for
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>> hardly a
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>>
>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>
>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi Anuj!
>>>>>>
>>>>>> Thanks for the clarification.
>>>>>>
>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>
>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>> fact
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>> only
>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>> classes
>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>> the
>>>>> right classes, and if there are two versions of the same library (eg
>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>> The
>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>> them so
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>> to
>>>>> avoid it.
>>>>>
>>>>>>
>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>
>>>>> Indexing Technology is used") imply that in the assembler
>>>>> configuration,
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>> backends?
>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>> JAR)
>>>>> both the jena-text and jena-text-es modules with all their
>>>>> dependencies,
>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>> Lucene
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>> time,
>>>>> and that the Java classloader, even though it has access to both
>>>>> versions
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>> same
>>>>> Fuseki JAR?
>>>>>
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>
>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>
>>>>>>> Hi Osma,
>>>>>>>
>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>> and
>>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>
>>>>>> will
>>>>>
>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>
>>>>>> say
>>>>>
>>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>
>>>>>> say
>>>>>
>>>>>> that it is indeed possible :)
>>>>>>>
>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>
>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>> assume
>>>>>>>
>>>>>> you
>>>>>
>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>
>>>>>> without
>>>>>
>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>
>>>>>> answer is
>>>>>
>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>> explicitly.
>>>>>>>
>>>>>>> *Assumption:*
>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>> used
>>>>>>>
>>>>>> for
>>>>>
>>>>>> text based indexing and searching via Jean. What this means is that we
>>>>>>>
>>>>>> will
>>>>>
>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>> Implementation at any given point in time.
>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>
>>>>>> but
>>>>>
>>>>>> only on jena-text classes, if at all.
>>>>>>>
>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>
>>>>>> contains
>>>>>
>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>
>>>>>> beginning
>>>>>
>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>
>>>>>> pom of
>>>>>
>>>>>> jena-text-es module here to see how it can be done :
>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anuj Kumar
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>
>>>>>> osma.suominen@helsinki.fi>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Anuj,
>>>>>>>>
>>>>>>>> I understand your concerns. However, we also need to balance between
>>>>>>>>
>>>>>>> the
>>>>>
>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>
>>>>>>> willing to
>>>>>
>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>
>>>>>>> Lucene
>>>>>
>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>
>>>>>>> hitches
>>>>>
>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>> features that were dropped from newer versions.
>>>>>>>>
>>>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>>>>>>>
>>>>>>> even
>>>>>
>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>
>>>>>>> Lucene
>>>>>
>>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>
>>>>>>> understanding of
>>>>>
>>>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>> code
>>>>>>>> running within the same JVM.
>>>>>>>>
>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>
>>>>>>> jena-text
>>>>>
>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>
>>>>>>> (depending
>>>>>
>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> My 2 Cents :
>>>>>>>>>
>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>>>>>>>>
>>>>>>>> ES is
>>>>>
>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>
>>>>>>>> if we
>>>>>
>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>
>>>>>>>> future I
>>>>>
>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>
>>>>>>>> Lucene
>>>>>
>>>>>> and Solr and possibly another implementation that may have been added
>>>>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>
>>>>>>>> months to
>>>>>
>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>
>>>>>>>> Jena-Text as
>>>>>
>>>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>> technology.
>>>>>>>>>
>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>
>>>>>>>> independently
>>>>>
>>>>>> of
>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>
>>>>>>>> latest
>>>>>
>>>>>> version of Lucene because we do not know what effect it will have on
>>>>>>>>>
>>>>>>>> Solr
>>>>>
>>>>>> Implementation.
>>>>>>>>>
>>>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>>>>>>>>
>>>>>>>> how
>>>>>
>>>>>> things go. If they go well, we could extract out Solr and Lucene out
>>>>>>>>>
>>>>>>>> of
>>>>>
>>>>>> Jena Text.
>>>>>>>>>
>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>
>>>>>>>> experience.
>>>>>
>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>
>>>>>>>> osma.suominen@helsinki.fi
>>>>>
>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>
>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>>>>>>>>>>>
>>>>>>>>>> che.org%3E
>>>>>
>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>
>>>>>>>>>> and
>>>>>
>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>> volunteer
>>>>>>>>>>
>>>>>>>>> to do
>>>>>
>>>>>> the actual work!
>>>>>>>>>>
>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>
>>>>>>>>> doing...
>>>>>
>>>>>>
>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>
>>>>>>>>>> work to
>>>>>
>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>
>>>>>>>>>> point.
>>>>>
>>>>>> :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>> how
>>>>>>>>>>
>>>>>>>>> to
>>>>>
>>>>>> get
>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>
>>>>>>>>> your
>>>>>
>>>>>> data
>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>
>>>>>>>>> anything, so
>>>>>
>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>> instructions
>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Anuj Kumar*
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>


-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Thanks Osma. I sent my previous email just a minute early. I will try your
suggestion and if it doesn't work will send you the entire example.

Thanks again.
Anuj

On 1 Mar 2017 19:36, "Osma Suominen" <os...@helsinki.fi> wrote:

> Hi Anuj!
>
> Generally I use assembler descriptions to configure the jena-text index.
> An example with multiple properties (SKOS label properties) is here:
> https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
> ating-a-text-index
>
> For examples on how to use assembler descriptions from Java code, take a
> look at the jena-text unit tests. They generally contain a snippet of
> assembler definition that configures the text index in a particular way,
> then test that it does what it should when using that configuration.
>
> You didn't provide a full example. What is your data and what query did
> you use? What results did you expect? What happened instead?
>
> One possible problem in your configuration is that you have set the
> primary predicate to rdfs:label, but not set a field for it. Try adding
> this:
>
> entDef.set("label", RDFS.label.asNode());
>
> For querying everything else but the default field, you need to specify
> the predicate at query time. With your configuration, it should be possible
> to query rdfs:comment values like this:
>
> ?s text:query (rdfs:comment "word") .
>
> Hope this helps!
>
> -Osma
>
> 01.03.2017, 17:33, anuj kumar kirjoitti:
>
>> BTW, I have one more question:
>>
>> How do I add more than one field to be indexed in my Index?
>> Basically, if I want to index rdfs:label , rdfs:comment in the same index
>> document, how do I do it?
>>
>> I tried :
>>
>> EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
>> entDef.setPrimaryPredicate(RDFS.label);
>> entDef.setGraphField(GRAPH_FIELD_NAME);
>> entDef.set("comment", RDFS.comment.asNode());
>>
>> But it doesnt work. Can you please point me on a way to do it please. This
>> is an important piece of functionality I need.
>>
>> Thanks,
>> Anuj Kumar
>>
>>
>> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <an...@gmail.com>
>> wrote:
>>
>> I personally have no preference as to how the code in Jena should be
>>> structured, as long as I am able to use it :).
>>> I have personal preference of doing it in a specific way because IMO, it
>>> is modular which makes it much easier to maintain in the long run. But
>>> again it may not be the quickest one.
>>>
>>> I already have been given a deadline, by the company to have ES extension
>>> implemented in the next 15 days :). What this means is that I will be
>>> maintaining the ES code extension to Jena Text at-least locally for a
>>> coming period of time. I would be more than happy to contribute to Jena
>>> community whatever is required to have a proper ElasticSearch
>>> Implementation in place, whether within jena-text module or as a separate
>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>> version, I will have to maintain a separate module for jena-text-es.
>>>
>>> Cheers!
>>> Anuj Kumar
>>>
>>>
>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>
>>> Osma--
>>>>
>>>> The short answer is that yes, given the right tools you _can_ have
>>>> different versions of code accessible in different ways. The longer
>>>> answer
>>>> is that it's probably not a viable alternative for Jena for this
>>>> problem,
>>>> at least not without a lot of other change.
>>>>
>>>> You are right to point to the classloader mechanism as being at the
>>>> heart
>>>> of this question, but I must alter your remark just slightly. From "the
>>>> Java classloader only sees a single, flat package/class namespace and a
>>>> set
>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>> flat package/class namespace and a set of compiled classes".
>>>>
>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>> module boundaries (and even dynamic module relationships at run-time).
>>>> Each
>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>> for
>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>> the
>>>> way of types to function, based on metadata that the bundles provide to
>>>> the
>>>> framework. It's an incredibly powerful system (I use it every day and
>>>> enjoy
>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>> investment to use. In particular, it's probably too large to put
>>>> _inside_
>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>> hand.)
>>>>
>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>> this kind, but it's really meant for the JDK itself, not application
>>>> libraries. In theory, we could "roll our own" classloader management for
>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>> There might be another, more lightweight, toolkit out there to this
>>>> purpose, but I'm not aware of any myself.
>>>>
>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>> for
>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly
>>>> a
>>>> thing we want to do any more of than needed, I don't think.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>
>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> Thanks for the clarification.
>>>>>
>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>
>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>> convenient ways to structure a Java project. Maven cannot change the
>>>> fact
>>>> that at runtime, module divisions don't really matter (except that they
>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>> only
>>>> sees a single, flat package/class namespace and a set of compiled
>>>> classes
>>>> (usually within JARs) in the classpath that it needs to check to find
>>>> the
>>>> right classes, and if there are two versions of the same library (eg
>>>> Lucene) with overlapping class names, that's going to cause trouble. The
>>>> only way around that is to shade some of the libraries, i.e. rename
>>>> them so
>>>> that they end up in another, non-conflicting namespace. Apparently
>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>> to
>>>> avoid it.
>>>>
>>>>>
>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>
>>>> Indexing Technology is used") imply that in the assembler configuration,
>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>> backends?
>>>> Or how do you run something like Fuseki that contains (in a single big
>>>> JAR)
>>>> both the jena-text and jena-text-es modules with all their dependencies,
>>>> one of which requires the Lucene 4.x classes and the other one the
>>>> Lucene
>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>> time,
>>>> and that the Java classloader, even though it has access to both
>>>> versions
>>>> of Lucene, only loads classes from the single, correct one and not the
>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>> packages, so that you don't end up with two Lucene versions within the
>>>> same
>>>> Fuseki JAR?
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>
>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>
>>>>>> Hi Osma,
>>>>>>
>>>>>> I understand what you are saying. There are ways to mitigate risks and
>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>
>>>>> will
>>>>
>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>
>>>>> say
>>>>
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>
>>>>> say
>>>>
>>>>> that it is indeed possible :)
>>>>>>
>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>
>>>>>> I actually do not understand what you mean by mixing modules. I assume
>>>>>>
>>>>> you
>>>>
>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>
>>>>> without
>>>>
>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>
>>>>> answer is
>>>>
>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>> possible. But before that some assumption which I want to call out
>>>>>> explicitly.
>>>>>>
>>>>>> *Assumption:*
>>>>>> 1. At a given point in time, only a single Indexing Technology is used
>>>>>>
>>>>> for
>>>>
>>>>> text based indexing and searching via Jean. What this means is that we
>>>>>>
>>>>> will
>>>>
>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>> Implementation at any given point in time.
>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>
>>>>> but
>>>>
>>>>> only on jena-text classes, if at all.
>>>>>>
>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>
>>>>> contains
>>>>
>>>>> jena-text based common classes + ES specific classes without any
>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>
>>>>> beginning
>>>>
>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>
>>>>> pom of
>>>>
>>>>> jena-text-es module here to see how it can be done :
>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>
>>>>> osma.suominen@helsinki.fi>
>>>>
>>>>> wrote:
>>>>>>
>>>>>> Hi Anuj,
>>>>>>>
>>>>>>> I understand your concerns. However, we also need to balance between
>>>>>>>
>>>>>> the
>>>>
>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>
>>>>>> willing to
>>>>
>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>
>>>>>> hitches
>>>>
>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>> features that were dropped from newer versions.
>>>>>>>
>>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>>>>>>
>>>>>> even
>>>>
>>>>> possible to mix modules that depend on different versions of the
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> libraries within the same project? In my (quite limited)
>>>>>>>
>>>>>> understanding of
>>>>
>>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>> code
>>>>>>> running within the same JVM.
>>>>>>>
>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>
>>>>>> jena-text
>>>>
>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>
>>>>>> (depending
>>>>
>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>> My 2 Cents :
>>>>>>>>
>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>>>>>>>
>>>>>>> ES is
>>>>
>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>
>>>>>>> if we
>>>>
>>>>> club them all together. If they stay together and if in the near
>>>>>>>>
>>>>>>> future I
>>>>
>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>
>>>>>>> Lucene
>>>>
>>>>> and Solr and possibly another implementation that may have been added
>>>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>
>>>>>>> months to
>>>>
>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>
>>>>>>> Jena-Text as
>>>>
>>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>> technology.
>>>>>>>>
>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>
>>>>>>> independently
>>>>
>>>>> of
>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>
>>>>>>> latest
>>>>
>>>>> version of Lucene because we do not know what effect it will have on
>>>>>>>>
>>>>>>> Solr
>>>>
>>>>> Implementation.
>>>>>>>>
>>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>>>>>>>
>>>>>>> how
>>>>
>>>>> things go. If they go well, we could extract out Solr and Lucene out
>>>>>>>>
>>>>>>> of
>>>>
>>>>> Jena Text.
>>>>>>>>
>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>
>>>>>>> experience.
>>>>
>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anuj Kumar
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>
>>>>>>> osma.suominen@helsinki.fi
>>>>
>>>>>
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>
>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>>>>>>>>>>
>>>>>>>>> che.org%3E
>>>>
>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>
>>>>>>>>> and
>>>>
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>>>>>>>>>
>>>>>>>> to do
>>>>
>>>>> the actual work!
>>>>>>>>>
>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>
>>>>>>>> doing...
>>>>
>>>>>
>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>
>>>>>>>>> work to
>>>>
>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>
>>>>>>>>> point.
>>>>
>>>>> :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>> how
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> get
>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>
>>>>>>>> your
>>>>
>>>>> data
>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>
>>>>>>>> anything, so
>>>>
>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>> instructions
>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Osma
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Osma Suominen
>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>> National Library of Finland
>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Osma Suominen
>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>> National Library of Finland
>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>> Tel. +358 50 3199529
>>>>>>> osma.suominen@helsinki.fi
>>>>>>> http://www.nationallibrary.fi
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> *Anuj Kumar*
>>>
>>>
>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj!

Generally I use assembler descriptions to configure the jena-text index. 
An example with multiple properties (SKOS label properties) is here: 
https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#creating-a-text-index

For examples on how to use assembler descriptions from Java code, take a 
look at the jena-text unit tests. They generally contain a snippet of 
assembler definition that configures the text index in a particular way, 
then test that it does what it should when using that configuration.

You didn't provide a full example. What is your data and what query did 
you use? What results did you expect? What happened instead?

One possible problem in your configuration is that you have set the 
primary predicate to rdfs:label, but not set a field for it. Try adding 
this:

entDef.set("label", RDFS.label.asNode());

For querying everything else but the default field, you need to specify 
the predicate at query time. With your configuration, it should be 
possible to query rdfs:comment values like this:

?s text:query (rdfs:comment "word") .

Hope this helps!

-Osma

01.03.2017, 17:33, anuj kumar kirjoitti:
> BTW, I have one more question:
>
> How do I add more than one field to be indexed in my Index?
> Basically, if I want to index rdfs:label , rdfs:comment in the same index
> document, how do I do it?
>
> I tried :
>
> EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
> entDef.setPrimaryPredicate(RDFS.label);
> entDef.setGraphField(GRAPH_FIELD_NAME);
> entDef.set("comment", RDFS.comment.asNode());
>
> But it doesnt work. Can you please point me on a way to do it please. This
> is an important piece of functionality I need.
>
> Thanks,
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <an...@gmail.com> wrote:
>
>> I personally have no preference as to how the code in Jena should be
>> structured, as long as I am able to use it :).
>> I have personal preference of doing it in a specific way because IMO, it
>> is modular which makes it much easier to maintain in the long run. But
>> again it may not be the quickest one.
>>
>> I already have been given a deadline, by the company to have ES extension
>> implemented in the next 15 days :). What this means is that I will be
>> maintaining the ES code extension to Jena Text at-least locally for a
>> coming period of time. I would be more than happy to contribute to Jena
>> community whatever is required to have a proper ElasticSearch
>> Implementation in place, whether within jena-text module or as a separate
>> module. Till the time Lucene and Solr is not upgraded to the latest
>> version, I will have to maintain a separate module for jena-text-es.
>>
>> Cheers!
>> Anuj Kumar
>>
>>
>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>
>>> Osma--
>>>
>>> The short answer is that yes, given the right tools you _can_ have
>>> different versions of code accessible in different ways. The longer answer
>>> is that it's probably not a viable alternative for Jena for this problem,
>>> at least not without a lot of other change.
>>>
>>> You are right to point to the classloader mechanism as being at the heart
>>> of this question, but I must alter your remark just slightly. From "the
>>> Java classloader only sees a single, flat package/class namespace and a set
>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>> flat package/class namespace and a set of compiled classes".
>>>
>>> This is the fact that OSGi uses to make it possible to maintain strict
>>> module boundaries (and even dynamic module relationships at run-time). Each
>>> OSGi bundle sees its own classloader, and the framework is responsible for
>>> connecting bundles up to ensure that every bundle has what it needs in the
>>> way of types to function, based on metadata that the bundles provide to the
>>> framework. It's an incredibly powerful system (I use it every day and enjoy
>>> it enormously) but it's also very "heavy" and requires a good deal of
>>> investment to use. In particular, it's probably too large to put _inside_
>>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>>
>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>> this kind, but it's really meant for the JDK itself, not application
>>> libraries. In theory, we could "roll our own" classloader management for
>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>> There might be another, more lightweight, toolkit out there to this
>>> purpose, but I'm not aware of any myself.
>>>
>>> Otherwise, yes, you get into shading and the like. We have to do that for
>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>>> thing we want to do any more of than needed, I don't think.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>
>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>> wrote:
>>>>
>>>> Hi Anuj!
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> However, I'm still not sure I understand the situation completely. I
>>> know Maven can perform a lot of tricks, but Maven modules are just
>>> convenient ways to structure a Java project. Maven cannot change the fact
>>> that at runtime, module divisions don't really matter (except that they
>>> usually correspond to package sub-namespaces) and the Java classloader only
>>> sees a single, flat package/class namespace and a set of compiled classes
>>> (usually within JARs) in the classpath that it needs to check to find the
>>> right classes, and if there are two versions of the same library (eg
>>> Lucene) with overlapping class names, that's going to cause trouble. The
>>> only way around that is to shade some of the libraries, i.e. rename them so
>>> that they end up in another, non-conflicting namespace. Apparently
>>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>>> avoid it.
>>>>
>>>> Does your assumption 1 ("At a given point in time, only a single
>>> Indexing Technology is used") imply that in the assembler configuration,
>>> you cannot have ja:loadClass declarations for both Lucene and ES backends?
>>> Or how do you run something like Fuseki that contains (in a single big JAR)
>>> both the jena-text and jena-text-es modules with all their dependencies,
>>> one of which requires the Lucene 4.x classes and the other one the Lucene
>>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>>> and that the Java classloader, even though it has access to both versions
>>> of Lucene, only loads classes from the single, correct one and not the
>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>> packages, so that you don't end up with two Lucene versions within the same
>>> Fuseki JAR?
>>>>
>>>> -Osma
>>>>
>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>
>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>> Hi Osma,
>>>>>
>>>>> I understand what you are saying. There are ways to mitigate risks and
>>>>> balance the refactoring without affecting the existing modules. But I
>>> will
>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>> say
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>> say
>>>>> that it is indeed possible :)
>>>>>
>>>>> For the question: "is it even possible to mix modules that depend on
>>>>> different versions of the Lucene libraries within the same project?"
>>>>>
>>>>> I actually do not understand what you mean by mixing modules. I assume
>>> you
>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>> without
>>>>> causing the build to conflict. If that is what you mean than the
>>> answer is
>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>> possible. But before that some assumption which I want to call out
>>>>> explicitly.
>>>>>
>>>>> *Assumption:*
>>>>> 1. At a given point in time, only a single Indexing Technology is used
>>> for
>>>>> text based indexing and searching via Jean. What this means is that we
>>> will
>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>> Implementation at any given point in time.
>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>> but
>>>>> only on jena-text classes, if at all.
>>>>>
>>>>> Based on these assumptions it is possible to create a build that
>>> contains
>>>>> jena-text based common classes + ES specific classes without any
>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>> The key is to include the latest Lucene dependencies at the very
>>> beginning
>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>> automatically resolve the dependency issues by including the Lucene
>>>>> librarires that we included in our es specific pom. Have a look the
>>> pom of
>>>>> jena-text-es module here to see how it can be done :
>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Anuj Kumar
>>>>>
>>>>>
>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>> osma.suominen@helsinki.fi>
>>>>> wrote:
>>>>>
>>>>>> Hi Anuj,
>>>>>>
>>>>>> I understand your concerns. However, we also need to balance between
>>> the
>>>>>> needs of individual modules/features and the whole codebase. I'm
>>> willing to
>>>>>> put in the effort to keep the other modules up to date with newer
>>> Lucene
>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>> hitches
>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>> features that were dropped from newer versions.
>>>>>>
>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>> even
>>>>>> possible to mix modules that depend on different versions of the
>>> Lucene
>>>>>> libraries within the same project? In my (quite limited)
>>> understanding of
>>>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>>> shading) as the Java package/class namespace is shared by all the code
>>>>>> running within the same JVM.
>>>>>>
>>>>>> So can you create, say, a Fuseki build that contains the current
>>> jena-text
>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>> (depending
>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My 2 Cents :
>>>>>>>
>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>> ES is
>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>> if we
>>>>>>> club them all together. If they stay together and if in the near
>>> future I
>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>> Lucene
>>>>>>> and Solr and possibly another implementation that may have been added
>>>>>>> during the time. As we all know, this means weeks of work if not
>>> months to
>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>> anything and I will probably start maintaining my version of
>>> Jena-Text as
>>>>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>>> process own(read fix bugs) the upgrade for each and every technology.
>>>>>>>
>>>>>>> If they are developed as separate modules, they can evolve
>>> independently
>>>>>>> of
>>>>>>> each other and we can avoid situations where we cant upgrade to
>>> latest
>>>>>>> version of Lucene because we do not know what effect it will have on
>>> Solr
>>>>>>> Implementation.
>>>>>>>
>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>> how
>>>>>>> things go. If they go well, we could extract out Solr and Lucene out
>>> of
>>>>>>> Jena Text.
>>>>>>>
>>>>>>> Again this is just a suggestion based on my limited industry
>>> experience.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anuj Kumar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>> osma.suominen@helsinki.fi
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>
>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>>> che.org%3E
>>>>>>>>> ? In other words, might it be better to factor out between -text
>>> and
>>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>>> to do
>>>>>>>> the actual work!
>>>>>>>>
>>>>>>>> I don't use the Solr component now, but I could easily see so
>>> doing...
>>>>>>>>
>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>> work to
>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>> point.
>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>>> to
>>>>>>>> get
>>>>>>>> it running... If you could just try that with some toy data, then
>>> your
>>>>>>>> data
>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>> anything, so
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>> instructions
>>>>>>>> are pretty vague unfortunately.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>
>>>
>>
>>
>> --
>> *Anuj Kumar*
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Dropping Solr? Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by "A. Soroka" <aj...@virginia.edu>.
+1. As I wrote in an earlier message, I have a few vague and hazy future use cases for Solr with Jena, but even for those, I can easily imagine using external machinery like Apache Camel to integrate the two stores. It's just not Jena's job to do that.

Especially with the advent of support for ElasticSearch, but honestly, even without it. Solr has gotten progressively harder to support over the years as idiosyncrasies in that project pile up, and in version 6, they don't even bother to create a portable application.

---
A. Soroka
The University of Virginia Library

> On Mar 3, 2017, at 8:06 AM, Bruno P. Kinoshita <ki...@apache.org> wrote:
> 
> I think the only case recently where I had a project with Solr was when working with a Cloudera cluster (I believe it used to come pre-configured as a bundle of Cloudera Manager). But even then we had an ElasticSearch for the initial analysis before importing into the cluster.
> 
> I find ES easier to use. To be honest, even using Lucene API directly in Python and Java sometimes is easier for me :) so my +1
> 
> Cheers
> Bruno
> 
> 
> ----- Original Message -----
> From: Osma Suominen <os...@helsinki.fi>
> To: dev@jena.apache.org
> Sent: Saturday, 4 March 2017 1:57 AM
> Subject: Dropping Solr? Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine
> 
> Hi Anuj,
> 
> You may be right about Solr. I didn't look at it at all and there are no 
> Solr-specific unit tests.
> 
> I would like to drop Solr support altogether from jena-text. It has 
> already bitrotted quite a bit, doesn't support any of the features added 
> to jena-text in recent years, and from my perspective Elasticsearch 
> would be a much better alternative if you need an external text index. 
> Opinions?
> 
> Regarding issues.apache.org, you can register here:
> https://issues.apache.org/jira/secure/Dashboard.jspa
> There should be instructions available for everything.
> 
> -Osma
> 
> 
> 
> 03.03.2017, 14:23, anuj kumar kirjoitti:
>> I Osma,
>> I briefly looked at the pull request. I beieve we need to upgrade Lucene
>> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
>> 4.9.1
>> 
>> Also how do i log into  issues.apache.org and where to file this bug?
>> 
>> Thanks,
>> Anuj Kumar
>> 
>> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>> 
>>> Hi Anuj,
>>> 
>>> It's great that we found agreement over this!
>>> 
>>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>>> as an intermediate step). I'll wait for comments on the PR and if people
>>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>>> already base your ES implementation on that branch [2] if you like.
>>> 
>>> Could you please open a JIRA issue on issues.apache.org explaining the
>>> Elasticsearch support feature, so that we have a place for tracking this
>>> work, request comments etc.
>>> 
>>> Also I suggest we move the discussion around this to the developers' list (
>>> dev@jena.apache.org) where it's more appropriate.
>>> 
>>> -Osma
>>> 
>>> [1] https://github.com/apache/jena/pull/219
>>> 
>>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>> 
>>> 
>>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>> 
>>>> I second that. I am now finalising the integration of ES and should have a
>>>> good production quality implementation ready in a week's time.  At that
>>>> time I would want you guys to have a look at the implementation and
>>>> provide
>>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>>> code in jena-text module and do a round of testing.
>>>> 
>>>> Thanks,
>>>> Anuj Kumar
>>>> 
>>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>> 
>>>> I do agree that trying to juggle different versions of Lucene libraries is
>>>>> probably not a realistic option right now. Luckily (if I understand the
>>>>> conversation thus far correctly) we have a solid alternative; getting our
>>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>>> if I
>>>>> have that wrong! :grin:
>>>>> 
>>>>> Let me reiterate that this seems like very good work and speaking for
>>>>> myself, I certainly want to get it included into Jena. It's just a
>>>>> question
>>>>> of fitting it in correctly, which might take a bit of time.
>>>>> 
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>> 
>>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> Hi Anuj!
>>>>>> 
>>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>> 
>>>>> your proposal could work in practice for the Fuseki build, due to the
>>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>> 
>>>>>> 
>>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>> 
>>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>>> jena-text, right? I think that would be better for everyone than having
>>>>> to
>>>>> maintain your own separate module.
>>>>> 
>>>>>> 
>>>>>> -Osma
>>>>>> 
>>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>> 
>>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>>> structured, as long as I am able to use it :).
>>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>> 
>>>>>> it is
>>>>> 
>>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>> 
>>>>>> again
>>>>> 
>>>>>> it may not be the quickest one.
>>>>>>> 
>>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>> 
>>>>>> extension
>>>>> 
>>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>> 
>>>>>> separate
>>>>> 
>>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>> 
>>>>>>> Cheers!
>>>>>>> Anuj Kumar
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>> 
>>>>>>> Osma--
>>>>>>>> 
>>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>> 
>>>>>>> answer
>>>>> 
>>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>> 
>>>>>>> problem,
>>>>> 
>>>>>> at least not without a lot of other change.
>>>>>>>> 
>>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>> 
>>>>>>> heart
>>>>> 
>>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>> 
>>>>>>> a set
>>>>> 
>>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>> 
>>>>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>>>>> 
>>>>>>> Each
>>>>> 
>>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>> 
>>>>>>> for
>>>>> 
>>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>> 
>>>>>>> the
>>>>> 
>>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>> 
>>>>>>> to the
>>>>> 
>>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>> 
>>>>>>> enjoy
>>>>> 
>>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>> 
>>>>>>> _inside_
>>>>> 
>>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>> 
>>>>>>> hand.)
>>>>> 
>>>>>> 
>>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>> 
>>>>>>> for
>>>>> 
>>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>> 
>>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>> 
>>>>>>> for
>>>>> 
>>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>> 
>>>>>>> hardly a
>>>>> 
>>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> A. Soroka
>>>>>>>> The University of Virginia Library
>>>>>>>> 
>>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>> 
>>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Anuj!
>>>>>>>>> 
>>>>>>>>> Thanks for the clarification.
>>>>>>>>> 
>>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>> 
>>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>> 
>>>>>>> fact
>>>>> 
>>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>>>>> 
>>>>>>> only
>>>>> 
>>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>> 
>>>>>>> classes
>>>>> 
>>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>> 
>>>>>>> the
>>>>> 
>>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>> 
>>>>>>> The
>>>>> 
>>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>> 
>>>>>>> them so
>>>>> 
>>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>>>>> 
>>>>>>> to
>>>>> 
>>>>>> avoid it.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>> 
>>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>> 
>>>>>>> configuration,
>>>>> 
>>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>> 
>>>>>>> backends?
>>>>> 
>>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>> 
>>>>>>> JAR)
>>>>> 
>>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>> 
>>>>>>> dependencies,
>>>>> 
>>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>> 
>>>>>>> Lucene
>>>>> 
>>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>> 
>>>>>>> time,
>>>>> 
>>>>>> and that the Java classloader, even though it has access to both
>>>>>>>> 
>>>>>>> versions
>>>>> 
>>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>>>>> 
>>>>>>> same
>>>>> 
>>>>>> Fuseki JAR?
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -Osma
>>>>>>>>> 
>>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>> 
>>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>> 
>>>>>>>>>> Hi Osma,
>>>>>>>>>> 
>>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>> 
>>>>>>>>> and
>>>>> 
>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>> 
>>>>>>>>> will
>>>>>>>> 
>>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>> 
>>>>>>>>> say
>>>>> 
>>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>> 
>>>>>>>>> say
>>>>> 
>>>>>> that it is indeed possible :)
>>>>>>>>>> 
>>>>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>>>> 
>>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>> 
>>>>>>>>> assume
>>>>> 
>>>>>> you
>>>>>>>> 
>>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>> 
>>>>>>>>> without
>>>>>>>> 
>>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>> 
>>>>>>>>> answer
>>>>> 
>>>>>> is
>>>>>>>> 
>>>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>>> explicitly.
>>>>>>>>>> 
>>>>>>>>>> *Assumption:*
>>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>> 
>>>>>>>>> used
>>>>> 
>>>>>> for
>>>>>>>> 
>>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>> 
>>>>>>>>> we
>>>>> 
>>>>>> will
>>>>>>>> 
>>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>>> Implementation at any given point in time.
>>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>>>> 
>>>>>>>>> but
>>>>> 
>>>>>> only on jena-text classes, if at all.
>>>>>>>>>> 
>>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>> 
>>>>>>>>> contains
>>>>>>>> 
>>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>>> succeeded.
>>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>> 
>>>>>>>>> beginning
>>>>>>>> 
>>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>> 
>>>>>>>>> pom
>>>>> 
>>>>>> of
>>>>>>>> 
>>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuj Kumar
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>> 
>>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Anuj,
>>>>>>>>>>> 
>>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>>> between
>>>>>>>>>>> 
>>>>>>>>>> the
>>>>>>>> 
>>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>> 
>>>>>>>>>> willing to
>>>>>>>> 
>>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>> 
>>>>>>>>>> Lucene
>>>>>>>> 
>>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>> 
>>>>>>>>>> hitches
>>>>>>>> 
>>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>> 
>>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>>> it
>>>>>>>>>>> 
>>>>>>>>>> even
>>>>>>>> 
>>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>> 
>>>>>>>>>> Lucene
>>>>> 
>>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>> 
>>>>>>>>>> understanding
>>>>> 
>>>>>> of
>>>>>>>> 
>>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>> 
>>>>>>>>>> (e.g.
>>>>> 
>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>> 
>>>>>>>>>> code
>>>>> 
>>>>>> running within the same JVM.
>>>>>>>>>>> 
>>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>> 
>>>>>>>>>> jena-text
>>>>>>>> 
>>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>> 
>>>>>>>>>> (depending
>>>>>>>> 
>>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>> 
>>>>>>>>>>> -Osma
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>> 
>>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>>> and
>>>>>>>>>>>> 
>>>>>>>>>>> ES is
>>>>>>>> 
>>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>> 
>>>>>>>>>>> if
>>>>> 
>>>>>> we
>>>>>>>> 
>>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>> 
>>>>>>>>>>> future I
>>>>>>>> 
>>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>> 
>>>>>>>>>>> Lucene
>>>>>>>> 
>>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>> 
>>>>>>>>>>> added
>>>>> 
>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>> 
>>>>>>>>>>> months to
>>>>>>>> 
>>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>> 
>>>>>>>>>>> Jena-Text as
>>>>>>>> 
>>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>> 
>>>>>>>>>>> the
>>>>> 
>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>> 
>>>>>>>>>>> technology.
>>>>> 
>>>>>> 
>>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>> 
>>>>>>>>>>> independently
>>>>>>>> 
>>>>>>>>> of
>>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>> 
>>>>>>>>>>> latest
>>>>> 
>>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>> 
>>>>>>>>>>> on
>>>>> 
>>>>>> Solr
>>>>>>>> 
>>>>>>>>> Implementation.
>>>>>>>>>>>> 
>>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>>> see
>>>>>>>>>>>> 
>>>>>>>>>>> how
>>>>>>>> 
>>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>> 
>>>>>>>>>>> out
>>>>> 
>>>>>> of
>>>>>>>> 
>>>>>>>>> Jena Text.
>>>>>>>>>>>> 
>>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>> 
>>>>>>>>>>> experience.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>> 
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> apache.org
>>>>> 
>>>>>> %3E
>>>>>>>> 
>>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> and
>>>>> 
>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>>> volunteer
>>>>>>>>>>>>> 
>>>>>>>>>>>> to do
>>>>>>>> 
>>>>>>>>> the actual work!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>> 
>>>>>>>>>>>> doing...
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> work to
>>>>>>>> 
>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> point.
>>>>>>>> 
>>>>>>>>> :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>> 
>>>>>>>>>>>> how
>>>>> 
>>>>>> to
>>>>>>>> 
>>>>>>>>> get
>>>>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>>>> 
>>>>>>>>>>>> your
>>>>>>>> 
>>>>>>>>> data
>>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>> 
>>>>>>>>>>>> anything, so
>>>>>>>> 
>>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>>> instructions
>>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Osma
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Osma Suominen
>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>> National Library of Finland
>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Osma Suominen
>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>> National Library of Finland
>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>> 
>> 
>> 
>> 
> 
> 
> -- 
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi


Re: Dropping Solr? Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by "Bruno P. Kinoshita" <ki...@apache.org>.
I think the only case recently where I had a project with Solr was when working with a Cloudera cluster (I believe it used to come pre-configured as a bundle of Cloudera Manager). But even then we had an ElasticSearch for the initial analysis before importing into the cluster.

I find ES easier to use. To be honest, even using Lucene API directly in Python and Java sometimes is easier for me :) so my +1

Cheers
Bruno


----- Original Message -----
From: Osma Suominen <os...@helsinki.fi>
To: dev@jena.apache.org
Sent: Saturday, 4 March 2017 1:57 AM
Subject: Dropping Solr? Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Hi Anuj,

You may be right about Solr. I didn't look at it at all and there are no 
Solr-specific unit tests.

I would like to drop Solr support altogether from jena-text. It has 
already bitrotted quite a bit, doesn't support any of the features added 
to jena-text in recent years, and from my perspective Elasticsearch 
would be a much better alternative if you need an external text index. 
Opinions?

Regarding issues.apache.org, you can register here:
https://issues.apache.org/jira/secure/Dashboard.jspa
There should be instructions available for everything.

-Osma



03.03.2017, 14:23, anuj kumar kirjoitti:
> I Osma,
>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
> 4.9.1
>
> Also how do i log into  issues.apache.org and where to file this bug?
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> It's great that we found agreement over this!
>>
>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>> as an intermediate step). I'll wait for comments on the PR and if people
>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>> already base your ES implementation on that branch [2] if you like.
>>
>> Could you please open a JIRA issue on issues.apache.org explaining the
>> Elasticsearch support feature, so that we have a place for tracking this
>> work, request comments etc.
>>
>> Also I suggest we move the discussion around this to the developers' list (
>> dev@jena.apache.org) where it's more appropriate.
>>
>> -Osma
>>
>> [1] https://github.com/apache/jena/pull/219
>>
>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>
>>
>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>
>>> I second that. I am now finalising the integration of ES and should have a
>>> good production quality implementation ready in a week's time.  At that
>>> time I would want you guys to have a look at the implementation and
>>> provide
>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>> code in jena-text module and do a round of testing.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>
>>> I do agree that trying to juggle different versions of Lucene libraries is
>>>> probably not a realistic option right now. Luckily (if I understand the
>>>> conversation thus far correctly) we have a solid alternative; getting our
>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>> if I
>>>> have that wrong! :grin:
>>>>
>>>> Let me reiterate that this seems like very good work and speaking for
>>>> myself, I certainly want to get it included into Jena. It's just a
>>>> question
>>>> of fitting it in correctly, which might take a bit of time.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>
>>>> your proposal could work in practice for the Fuseki build, due to the
>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>
>>>>>
>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>
>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>> jena-text, right? I think that would be better for everyone than having
>>>> to
>>>> maintain your own separate module.
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>
>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>> structured, as long as I am able to use it :).
>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>
>>>>> it is
>>>>
>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>
>>>>> again
>>>>
>>>>> it may not be the quickest one.
>>>>>>
>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>
>>>>> extension
>>>>
>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>
>>>>> separate
>>>>
>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>
>>>>>> Cheers!
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>
>>>>>> Osma--
>>>>>>>
>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>
>>>>>> answer
>>>>
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>
>>>>>> problem,
>>>>
>>>>> at least not without a lot of other change.
>>>>>>>
>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>
>>>>>> heart
>>>>
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>
>>>>>> a set
>>>>
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>
>>>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>>>>
>>>>>> Each
>>>>
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>
>>>>>> for
>>>>
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>
>>>>>> the
>>>>
>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>
>>>>>> to the
>>>>
>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>
>>>>>> enjoy
>>>>
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>
>>>>>> _inside_
>>>>
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>
>>>>>> hand.)
>>>>
>>>>>
>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>
>>>>>> for
>>>>
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>
>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>
>>>>>> for
>>>>
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>
>>>>>> hardly a
>>>>
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>
>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Anuj!
>>>>>>>>
>>>>>>>> Thanks for the clarification.
>>>>>>>>
>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>
>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>
>>>>>> fact
>>>>
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>>>>
>>>>>> only
>>>>
>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>
>>>>>> classes
>>>>
>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>
>>>>>> the
>>>>
>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>
>>>>>> The
>>>>
>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>
>>>>>> them so
>>>>
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>>>>
>>>>>> to
>>>>
>>>>> avoid it.
>>>>>>>
>>>>>>>>
>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>
>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>
>>>>>> configuration,
>>>>
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>
>>>>>> backends?
>>>>
>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>
>>>>>> JAR)
>>>>
>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>
>>>>>> dependencies,
>>>>
>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>
>>>>>> time,
>>>>
>>>>> and that the Java classloader, even though it has access to both
>>>>>>>
>>>>>> versions
>>>>
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>>>>
>>>>>> same
>>>>
>>>>> Fuseki JAR?
>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>
>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>>> Hi Osma,
>>>>>>>>>
>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>
>>>>>>>> and
>>>>
>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>
>>>>>>>> will
>>>>>>>
>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is indeed possible :)
>>>>>>>>>
>>>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>>>
>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>
>>>>>>>> assume
>>>>
>>>>> you
>>>>>>>
>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>
>>>>>>>> without
>>>>>>>
>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>
>>>>>>>> answer
>>>>
>>>>> is
>>>>>>>
>>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>> explicitly.
>>>>>>>>>
>>>>>>>>> *Assumption:*
>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>
>>>>>>>> used
>>>>
>>>>> for
>>>>>>>
>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>
>>>>>>>> we
>>>>
>>>>> will
>>>>>>>
>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>> Implementation at any given point in time.
>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>>>
>>>>>>>> but
>>>>
>>>>> only on jena-text classes, if at all.
>>>>>>>>>
>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>
>>>>>>>> contains
>>>>>>>
>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>> succeeded.
>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>
>>>>>>>> beginning
>>>>>>>
>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>
>>>>>>>> pom
>>>>
>>>>> of
>>>>>>>
>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>
>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Anuj,
>>>>>>>>>>
>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>> between
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>
>>>>>>>>> willing to
>>>>>>>
>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>
>>>>>>>>> hitches
>>>>>>>
>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>
>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> even
>>>>>>>
>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>
>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>
>>>>>>>>> understanding
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>
>>>>>>>>> (e.g.
>>>>
>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>
>>>>>>>>> code
>>>>
>>>>> running within the same JVM.
>>>>>>>>>>
>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>
>>>>>>>>> jena-text
>>>>>>>
>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>
>>>>>>>>> (depending
>>>>>>>
>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>
>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> ES is
>>>>>>>
>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>
>>>>>>>>>> if
>>>>
>>>>> we
>>>>>>>
>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>
>>>>>>>>>> future I
>>>>>>>
>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>
>>>>>>>>>> added
>>>>
>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>
>>>>>>>>>> months to
>>>>>>>
>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>
>>>>>>>>>> Jena-Text as
>>>>>>>
>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>
>>>>>>>>>> technology.
>>>>
>>>>>
>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>
>>>>>>>>>> independently
>>>>>>>
>>>>>>>> of
>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>
>>>>>>>>>> latest
>>>>
>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>
>>>>>>>>>> on
>>>>
>>>>> Solr
>>>>>>>
>>>>>>>> Implementation.
>>>>>>>>>>>
>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>> see
>>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>
>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>
>>>>>>>>>> out
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Jena Text.
>>>>>>>>>>>
>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>
>>>>>>>>>> experience.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>
>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>
>>>>>>>>>>>> apache.org
>>>>
>>>>> %3E
>>>>>>>
>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>> volunteer
>>>>>>>>>>>>
>>>>>>>>>>> to do
>>>>>>>
>>>>>>>> the actual work!
>>>>>>>>>>>>
>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>
>>>>>>>>>>> doing...
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>
>>>>>>>>>>>> work to
>>>>>>>
>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>
>>>>>>>>>>>> point.
>>>>>>>
>>>>>>>> :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>
>>>>>>>>>>> how
>>>>
>>>>> to
>>>>>>>
>>>>>>>> get
>>>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>>>
>>>>>>>>>>> your
>>>>>>>
>>>>>>>> data
>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>
>>>>>>>>>>> anything, so
>>>>>>>
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>> instructions
>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Osma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Dropping Solr? Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj,

You may be right about Solr. I didn't look at it at all and there are no 
Solr-specific unit tests.

I would like to drop Solr support altogether from jena-text. It has 
already bitrotted quite a bit, doesn't support any of the features added 
to jena-text in recent years, and from my perspective Elasticsearch 
would be a much better alternative if you need an external text index. 
Opinions?

Regarding issues.apache.org, you can register here:
https://issues.apache.org/jira/secure/Dashboard.jspa
There should be instructions available for everything.

-Osma



03.03.2017, 14:23, anuj kumar kirjoitti:
> I Osma,
>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
> 4.9.1
>
> Also how do i log into  issues.apache.org and where to file this bug?
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> It's great that we found agreement over this!
>>
>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>> as an intermediate step). I'll wait for comments on the PR and if people
>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>> already base your ES implementation on that branch [2] if you like.
>>
>> Could you please open a JIRA issue on issues.apache.org explaining the
>> Elasticsearch support feature, so that we have a place for tracking this
>> work, request comments etc.
>>
>> Also I suggest we move the discussion around this to the developers' list (
>> dev@jena.apache.org) where it's more appropriate.
>>
>> -Osma
>>
>> [1] https://github.com/apache/jena/pull/219
>>
>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>
>>
>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>
>>> I second that. I am now finalising the integration of ES and should have a
>>> good production quality implementation ready in a week's time.  At that
>>> time I would want you guys to have a look at the implementation and
>>> provide
>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>> code in jena-text module and do a round of testing.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>
>>> I do agree that trying to juggle different versions of Lucene libraries is
>>>> probably not a realistic option right now. Luckily (if I understand the
>>>> conversation thus far correctly) we have a solid alternative; getting our
>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>> if I
>>>> have that wrong! :grin:
>>>>
>>>> Let me reiterate that this seems like very good work and speaking for
>>>> myself, I certainly want to get it included into Jena. It's just a
>>>> question
>>>> of fitting it in correctly, which might take a bit of time.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>
>>>> your proposal could work in practice for the Fuseki build, due to the
>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>
>>>>>
>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>
>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>> jena-text, right? I think that would be better for everyone than having
>>>> to
>>>> maintain your own separate module.
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>
>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>> structured, as long as I am able to use it :).
>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>
>>>>> it is
>>>>
>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>
>>>>> again
>>>>
>>>>> it may not be the quickest one.
>>>>>>
>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>
>>>>> extension
>>>>
>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>
>>>>> separate
>>>>
>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>
>>>>>> Cheers!
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>
>>>>>> Osma--
>>>>>>>
>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>
>>>>>> answer
>>>>
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>
>>>>>> problem,
>>>>
>>>>> at least not without a lot of other change.
>>>>>>>
>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>
>>>>>> heart
>>>>
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>
>>>>>> a set
>>>>
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>
>>>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>>>>
>>>>>> Each
>>>>
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>
>>>>>> for
>>>>
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>
>>>>>> the
>>>>
>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>
>>>>>> to the
>>>>
>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>
>>>>>> enjoy
>>>>
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>
>>>>>> _inside_
>>>>
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>
>>>>>> hand.)
>>>>
>>>>>
>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>
>>>>>> for
>>>>
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>
>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>
>>>>>> for
>>>>
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>
>>>>>> hardly a
>>>>
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>
>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Anuj!
>>>>>>>>
>>>>>>>> Thanks for the clarification.
>>>>>>>>
>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>
>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>
>>>>>> fact
>>>>
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>>>>
>>>>>> only
>>>>
>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>
>>>>>> classes
>>>>
>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>
>>>>>> the
>>>>
>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>
>>>>>> The
>>>>
>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>
>>>>>> them so
>>>>
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>>>>
>>>>>> to
>>>>
>>>>> avoid it.
>>>>>>>
>>>>>>>>
>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>
>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>
>>>>>> configuration,
>>>>
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>
>>>>>> backends?
>>>>
>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>
>>>>>> JAR)
>>>>
>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>
>>>>>> dependencies,
>>>>
>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>
>>>>>> time,
>>>>
>>>>> and that the Java classloader, even though it has access to both
>>>>>>>
>>>>>> versions
>>>>
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>>>>
>>>>>> same
>>>>
>>>>> Fuseki JAR?
>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>
>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>>> Hi Osma,
>>>>>>>>>
>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>
>>>>>>>> and
>>>>
>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>
>>>>>>>> will
>>>>>>>
>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is indeed possible :)
>>>>>>>>>
>>>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>>>
>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>
>>>>>>>> assume
>>>>
>>>>> you
>>>>>>>
>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>
>>>>>>>> without
>>>>>>>
>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>
>>>>>>>> answer
>>>>
>>>>> is
>>>>>>>
>>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>> explicitly.
>>>>>>>>>
>>>>>>>>> *Assumption:*
>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>
>>>>>>>> used
>>>>
>>>>> for
>>>>>>>
>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>
>>>>>>>> we
>>>>
>>>>> will
>>>>>>>
>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>> Implementation at any given point in time.
>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>>>
>>>>>>>> but
>>>>
>>>>> only on jena-text classes, if at all.
>>>>>>>>>
>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>
>>>>>>>> contains
>>>>>>>
>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>> succeeded.
>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>
>>>>>>>> beginning
>>>>>>>
>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>
>>>>>>>> pom
>>>>
>>>>> of
>>>>>>>
>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>
>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Anuj,
>>>>>>>>>>
>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>> between
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>
>>>>>>>>> willing to
>>>>>>>
>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>
>>>>>>>>> hitches
>>>>>>>
>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>
>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> even
>>>>>>>
>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>
>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>
>>>>>>>>> understanding
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>
>>>>>>>>> (e.g.
>>>>
>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>
>>>>>>>>> code
>>>>
>>>>> running within the same JVM.
>>>>>>>>>>
>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>
>>>>>>>>> jena-text
>>>>>>>
>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>
>>>>>>>>> (depending
>>>>>>>
>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>
>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> ES is
>>>>>>>
>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>
>>>>>>>>>> if
>>>>
>>>>> we
>>>>>>>
>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>
>>>>>>>>>> future I
>>>>>>>
>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>
>>>>>>>>>> added
>>>>
>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>
>>>>>>>>>> months to
>>>>>>>
>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>
>>>>>>>>>> Jena-Text as
>>>>>>>
>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>
>>>>>>>>>> technology.
>>>>
>>>>>
>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>
>>>>>>>>>> independently
>>>>>>>
>>>>>>>> of
>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>
>>>>>>>>>> latest
>>>>
>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>
>>>>>>>>>> on
>>>>
>>>>> Solr
>>>>>>>
>>>>>>>> Implementation.
>>>>>>>>>>>
>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>> see
>>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>
>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>
>>>>>>>>>> out
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Jena Text.
>>>>>>>>>>>
>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>
>>>>>>>>>> experience.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>
>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>
>>>>>>>>>>>> apache.org
>>>>
>>>>> %3E
>>>>>>>
>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>> volunteer
>>>>>>>>>>>>
>>>>>>>>>>> to do
>>>>>>>
>>>>>>>> the actual work!
>>>>>>>>>>>>
>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>
>>>>>>>>>>> doing...
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>
>>>>>>>>>>>> work to
>>>>>>>
>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>
>>>>>>>>>>>> point.
>>>>>>>
>>>>>>>> :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>
>>>>>>>>>>> how
>>>>
>>>>> to
>>>>>>>
>>>>>>>> get
>>>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>>>
>>>>>>>>>>> your
>>>>>>>
>>>>>>>> data
>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>
>>>>>>>>>>> anything, so
>>>>>>>
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>> instructions
>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Osma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Created JENA-1305  <https://issues.apache.org/jira/browse/JENA-1305> to
track Jena ElasticSearch Implementation.
Can some one please add the required features that need to be supported. I
have added what I can think of.

Thanks,
Anuj Kumar

On Sat, Mar 4, 2017 at 3:41 PM, anuj kumar <an...@gmail.com> wrote:

> Hi Osma,
> I just subscribed to the Dev mailing list.
> I think you are right. If we know that there's no one using Solr it would
> actually be wise to drop it in favour of ElasticSearch.
>
> Thanks,
> Anuj Kumar
>
>
> On Fri, Mar 3, 2017 at 4:51 PM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> Did you see my earlier message to the dev list? Are you subscribed to
>> that list? I will Cc: you this time just to be sure. See
>> http://jena.markmail.org/thread/uhs6cuhotzj4tjrj for the actual message
>> in case you missed it (including some replies).
>>
>> I see what you mean by deprecating Solr first before removing it, but I
>> can't figure out how that would work in practice. If you're right about
>> Solr 4.9.1 requiring Lucene 4.9.1, then we can't have Solr and ES support
>> in Jena at the same time - unless we upgrade the Solr side as well, which
>> seems a bit of a waste of time if you're going to remove it anyway.
>>
>> Like I explained in JENA-1301 there are many problems with the Solr
>> implementation and I doubt there are many users, quite possibly nobody at
>> all.
>>
>> In any case switching indexing technologies for jena-text should be
>> rather easy, as the text index itself doesn't need to be migrated - it can
>> simply be rebuilt from the RDF data. So if someone runs, say, Fuseki 2.5.0
>> with a Solr index, then upgrading to (as yet hypothetical) Fuseki 2.6.0
>> with an ES index instead is just a matter of setting up ES, changing the
>> text index configuration slightly and running jena.textindexer (or
>> reloading the data, whichever is easier). There is no technical benefit
>> from having support for both Solr and ES in the same Jena release as it
>> doesn't make migration any easier, but of course, advance warning might
>> help with planning the move to ES.
>>
>> -Osma
>>
>>
>>
>> 03.03.2017, 16:43, anuj kumar kirjoitti:
>>
>>> Hey,
>>>  I just saw https://issues.apache.org/jira/browse/JENA-1301
>>> Should we not first officially deprecate it and gives any users of Solr a
>>> chance to move to different Indexing technology?
>>>
>>> BTW, I dont know yet how to login to apache JIRA.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <an...@gmail.com>
>>> wrote:
>>>
>>> I Osma,
>>>>  I briefly looked at the pull request. I beieve we need to upgrade
>>>> Lucene
>>>> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on
>>>> Lucene
>>>> 4.9.1
>>>>
>>>> Also how do i log into  issues.apache.org and where to file this bug?
>>>>
>>>> Thanks,
>>>> Anuj Kumar
>>>>
>>>> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <
>>>> osma.suominen@helsinki.fi>
>>>> wrote:
>>>>
>>>> Hi Anuj,
>>>>>
>>>>> It's great that we found agreement over this!
>>>>>
>>>>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled
>>>>> and
>>>>> made a PR [1] that implements the upgrade up to version 6.4.1 (with
>>>>> 5.5.4
>>>>> as an intermediate step). I'll wait for comments on the PR and if
>>>>> people
>>>>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>>>>> already base your ES implementation on that branch [2] if you like.
>>>>>
>>>>> Could you please open a JIRA issue on issues.apache.org explaining the
>>>>> Elasticsearch support feature, so that we have a place for tracking
>>>>> this
>>>>> work, request comments etc.
>>>>>
>>>>> Also I suggest we move the discussion around this to the developers'
>>>>> list
>>>>> (dev@jena.apache.org) where it's more appropriate.
>>>>>
>>>>> -Osma
>>>>>
>>>>> [1] https://github.com/apache/jena/pull/219
>>>>>
>>>>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>>>>
>>>>>
>>>>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>>>>
>>>>> I second that. I am now finalising the integration of ES and should
>>>>>> have
>>>>>> a
>>>>>> good production quality implementation ready in a week's time.  At
>>>>>> that
>>>>>> time I would want you guys to have a look at the implementation and
>>>>>> provide
>>>>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge
>>>>>> the
>>>>>> code in jena-text module and do a round of testing.
>>>>>>
>>>>>> Thanks,
>>>>>> Anuj Kumar
>>>>>>
>>>>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>>>>
>>>>>> I do agree that trying to juggle different versions of Lucene
>>>>>> libraries
>>>>>>
>>>>>>> is
>>>>>>> probably not a realistic option right now. Luckily (if I understand
>>>>>>> the
>>>>>>> conversation thus far correctly) we have a solid alternative; getting
>>>>>>> our
>>>>>>> current Lucene dependency upgraded should allow us to (eventually)
>>>>>>> merge
>>>>>>> Anuj's work into the mainstream of development. Someone please tell
>>>>>>> me
>>>>>>> if I
>>>>>>> have that wrong! :grin:
>>>>>>>
>>>>>>> Let me reiterate that this seems like very good work and speaking for
>>>>>>> myself, I certainly want to get it included into Jena. It's just a
>>>>>>> question
>>>>>>> of fitting it in correctly, which might take a bit of time.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suominen@helsinki.fi
>>>>>>> >
>>>>>>>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi Anuj!
>>>>>>>>
>>>>>>>> I have nothing against modularity in general. However, I cannot see
>>>>>>>> how
>>>>>>>>
>>>>>>>> your proposal could work in practice for the Fuseki build, due to
>>>>>>> the
>>>>>>> reasons I mentioned in my previous message (and Adam seemed to
>>>>>>> concur).
>>>>>>>
>>>>>>>
>>>>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>>>>
>>>>>>>> again. If all current Jena modules (ie jena-text and jena-spatial)
>>>>>>> were
>>>>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>>>>> jena-text, right? I think that would be better for everyone than
>>>>>>> having
>>>>>>> to
>>>>>>> maintain your own separate module.
>>>>>>>
>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>>>>> structured, as long as I am able to use it :).
>>>>>>>>> I have personal preference of doing it in a specific way because
>>>>>>>>> IMO,
>>>>>>>>>
>>>>>>>>> it is
>>>>>>>>
>>>>>>>
>>>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>>>
>>>>>>>>>
>>>>>>>>> again
>>>>>>>>
>>>>>>>
>>>>>>> it may not be the quickest one.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>>>>
>>>>>>>>> extension
>>>>>>>>
>>>>>>>
>>>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>>>>
>>>>>>>>> maintaining the ES code extension to Jena Text at-least locally
>>>>>>>>> for a
>>>>>>>>> coming period of time. I would be more than happy to contribute to
>>>>>>>>> Jena
>>>>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>>>>
>>>>>>>>> separate
>>>>>>>>
>>>>>>>
>>>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>>>>
>>>>>>>>> version, I will have to maintain a separate module for
>>>>>>>>> jena-text-es.
>>>>>>>>>
>>>>>>>>> Cheers!
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Osma--
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>>>>> different versions of code accessible in different ways. The
>>>>>>>>>> longer
>>>>>>>>>>
>>>>>>>>>> answer
>>>>>>>>>
>>>>>>>>
>>>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> problem,
>>>>>>>>>
>>>>>>>>
>>>>>>> at least not without a lot of other change.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> You are right to point to the classloader mechanism as being at
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>> heart
>>>>>>>>>
>>>>>>>>
>>>>>>> of this question, but I must alter your remark just slightly. From
>>>>>>>> "the
>>>>>>>>
>>>>>>>>> Java classloader only sees a single, flat package/class namespace
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> a set
>>>>>>>>>
>>>>>>>>
>>>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a
>>>>>>>> single,
>>>>>>>>
>>>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>>>>
>>>>>>>>>> This is the fact that OSGi uses to make it possible to maintain
>>>>>>>>>> strict
>>>>>>>>>> module boundaries (and even dynamic module relationships at
>>>>>>>>>> run-time).
>>>>>>>>>>
>>>>>>>>>> Each
>>>>>>>>>
>>>>>>>>
>>>>>>> OSGi bundle sees its own classloader, and the framework is
>>>>>>>> responsible
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>
>>>>>>> connecting bundles up to ensure that every bundle has what it needs
>>>>>>>> in
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>
>>>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> to the
>>>>>>>>>
>>>>>>>>
>>>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> enjoy
>>>>>>>>>
>>>>>>>>
>>>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>>>
>>>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>>>>
>>>>>>>>>> _inside_
>>>>>>>>>
>>>>>>>>
>>>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hand.)
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization
>>>>>>>>>> of
>>>>>>>>>> this kind, but it's really meant for the JDK itself, not
>>>>>>>>>> application
>>>>>>>>>> libraries. In theory, we could "roll our own" classloader
>>>>>>>>>> management
>>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>
>>>>>>> this problem. That sounds like more than a bit of a rabbit hole to
>>>>>>>> me.
>>>>>>>>
>>>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>>>>
>>>>>>>>>> Otherwise, yes, you get into shading and the like. We have to do
>>>>>>>>>> that
>>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>
>>>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> hardly a
>>>>>>>>>
>>>>>>>>
>>>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> A. Soroka
>>>>>>>>>> The University of Virginia Library
>>>>>>>>>>
>>>>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>>>>
>>>>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Anuj!
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the clarification.
>>>>>>>>>>>
>>>>>>>>>>> However, I'm still not sure I understand the situation
>>>>>>>>>>> completely. I
>>>>>>>>>>>
>>>>>>>>>>> know Maven can perform a lot of tricks, but Maven modules are
>>>>>>>>>> just
>>>>>>>>>> convenient ways to structure a Java project. Maven cannot change
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>> fact
>>>>>>>>>
>>>>>>>>
>>>>>>> that at runtime, module divisions don't really matter (except that
>>>>>>>> they
>>>>>>>>
>>>>>>>>> usually correspond to package sub-namespaces) and the Java
>>>>>>>>>> classloader
>>>>>>>>>>
>>>>>>>>>> only
>>>>>>>>>
>>>>>>>>
>>>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> classes
>>>>>>>>>
>>>>>>>>
>>>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>
>>>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>>>
>>>>>>>>> Lucene) with overlapping class names, that's going to cause
>>>>>>>>>> trouble.
>>>>>>>>>>
>>>>>>>>>> The
>>>>>>>>>
>>>>>>>>
>>>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> them so
>>>>>>>>>
>>>>>>>>
>>>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>>>
>>>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays
>>>>>>>>>> tries
>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>
>>>>>>>>
>>>>>>> avoid it.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>>>>
>>>>>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>>>>
>>>>>>>>>> configuration,
>>>>>>>>>
>>>>>>>>
>>>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> backends?
>>>>>>>>>
>>>>>>>>
>>>>>>> Or how do you run something like Fuseki that contains (in a single
>>>>>>>> big
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> JAR)
>>>>>>>>>
>>>>>>>>
>>>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> dependencies,
>>>>>>>>>
>>>>>>>>
>>>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>>>
>>>>>>>>
>>>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> time,
>>>>>>>>>
>>>>>>>>
>>>>>>> and that the Java classloader, even though it has access to both
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> versions
>>>>>>>>>
>>>>>>>>
>>>>>>> of Lucene, only loads classes from the single, correct one and not
>>>>>>>> the
>>>>>>>>
>>>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and
>>>>>>>>>> "Fuseki-ES"
>>>>>>>>>> packages, so that you don't end up with two Lucene versions within
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>> same
>>>>>>>>>
>>>>>>>>
>>>>>>> Fuseki JAR?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -Osma
>>>>>>>>>>>
>>>>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>>>>
>>>>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>> Hi Osma,
>>>>>>>>>>>>
>>>>>>>>>>>> I understand what you are saying. There are ways to mitigate
>>>>>>>>>>>> risks
>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> will
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> not delve into those now. I am not an expert in Jena to
>>>>>>>>>>> convincingly
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> say
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> say
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> that it is indeed possible :)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> For the question: "is it even possible to mix modules that
>>>>>>>>>>>> depend
>>>>>>>>>>>> on
>>>>>>>>>>>> different versions of the Lucene libraries within the same
>>>>>>>>>>>> project?"
>>>>>>>>>>>>
>>>>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>>>>
>>>>>>>>>>>> assume
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> you
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> without
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> answer
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> is
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> yes it is possible and quite simple as well. Let me explain how it
>>>>>>>>>>>
>>>>>>>>>>>> is
>>>>>>>>>>>> possible. But before that some assumption which I want to call
>>>>>>>>>>>> out
>>>>>>>>>>>> explicitly.
>>>>>>>>>>>>
>>>>>>>>>>>> *Assumption:*
>>>>>>>>>>>> 1. At a given point in time, only a single Indexing Technology
>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>> used
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> for
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> text based indexing and searching via Jean. What this means is
>>>>>>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> we
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> will
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>>>>
>>>>>>>>>>>> Implementation at any given point in time.
>>>>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific
>>>>>>>>>>>> classes
>>>>>>>>>>>>
>>>>>>>>>>>> but
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> only on jena-text classes, if at all.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>>>>
>>>>>>>>>>>> contains
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>>>>
>>>>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in
>>>>>>>>>>>> the
>>>>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>>>>> succeeded.
>>>>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>>>>
>>>>>>>>>>>> beginning
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>>>>
>>>>>>>>>>>> automatically resolve the dependency issues by including the
>>>>>>>>>>>> Lucene
>>>>>>>>>>>> librarires that we included in our es specific pom. Have a look
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>> pom
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> of
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/po
>>>>>>>>>>>> m.xml
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>>>>
>>>>>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Anuj,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>>>>> between
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> willing to
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> versions. Lucene upgrade requirements are well documented, the
>>>>>>>>>>> only
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> hitches
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some
>>>>>>>>>>> Lucene
>>>>>>>>>>>
>>>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> A perhaps stupid question to more experienced Java developers:
>>>>>>>>>>>>> is
>>>>>>>>>>>>> it
>>>>>>>>>>>>>
>>>>>>>>>>>>> even
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>> understanding
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> of
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> (e.g.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>> code
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> running within the same JVM.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>> So can you create, say, a Fuseki build that contains the
>>>>>>>>>>>>> current
>>>>>>>>>>>>>
>>>>>>>>>>>>> jena-text
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> (depending
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -Osma
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The reason I proposed to have separate modules for Lucene,
>>>>>>>>>>>>>> Solr
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ES is
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> we
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> future I
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> want to upgrade ES to another version, I also need to again
>>>>>>>>>>> upgrade
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> months to
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> get the changes released. This will personally de-motivate me to
>>>>>>>>>>> do
>>>>>>>>>>>
>>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jena-Text as
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> technology.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> independently
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> latest
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> Solr
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Implementation.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> We can start with having a separate Module for Jena Text ES
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> out
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> of
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Jena Text.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> %3E
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> volunteer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> the actual work!
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> doing...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> work to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> point.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> :)
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure
>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> get
>>>>>>>>>>>
>>>>>>>>>>>> it running... If you could just try that with some toy data,
>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> data
>>>>>>>>>>>
>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> anything, so
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>
>>>>>>>>>>>> instructions
>>>>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Osma
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Osma Suominen
>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>> National Library of Finland
>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Anuj Kumar*
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Hi Osma,
I just subscribed to the Dev mailing list.
I think you are right. If we know that there's no one using Solr it would
actually be wise to drop it in favour of ElasticSearch.

Thanks,
Anuj Kumar


On Fri, Mar 3, 2017 at 4:51 PM, Osma Suominen <os...@helsinki.fi>
wrote:

> Hi Anuj,
>
> Did you see my earlier message to the dev list? Are you subscribed to that
> list? I will Cc: you this time just to be sure. See
> http://jena.markmail.org/thread/uhs6cuhotzj4tjrj for the actual message
> in case you missed it (including some replies).
>
> I see what you mean by deprecating Solr first before removing it, but I
> can't figure out how that would work in practice. If you're right about
> Solr 4.9.1 requiring Lucene 4.9.1, then we can't have Solr and ES support
> in Jena at the same time - unless we upgrade the Solr side as well, which
> seems a bit of a waste of time if you're going to remove it anyway.
>
> Like I explained in JENA-1301 there are many problems with the Solr
> implementation and I doubt there are many users, quite possibly nobody at
> all.
>
> In any case switching indexing technologies for jena-text should be rather
> easy, as the text index itself doesn't need to be migrated - it can simply
> be rebuilt from the RDF data. So if someone runs, say, Fuseki 2.5.0 with a
> Solr index, then upgrading to (as yet hypothetical) Fuseki 2.6.0 with an ES
> index instead is just a matter of setting up ES, changing the text index
> configuration slightly and running jena.textindexer (or reloading the data,
> whichever is easier). There is no technical benefit from having support for
> both Solr and ES in the same Jena release as it doesn't make migration any
> easier, but of course, advance warning might help with planning the move to
> ES.
>
> -Osma
>
>
>
> 03.03.2017, 16:43, anuj kumar kirjoitti:
>
>> Hey,
>>  I just saw https://issues.apache.org/jira/browse/JENA-1301
>> Should we not first officially deprecate it and gives any users of Solr a
>> chance to move to different Indexing technology?
>>
>> BTW, I dont know yet how to login to apache JIRA.
>>
>> Thanks,
>> Anuj Kumar
>>
>> On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <an...@gmail.com>
>> wrote:
>>
>> I Osma,
>>>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
>>> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on
>>> Lucene
>>> 4.9.1
>>>
>>> Also how do i log into  issues.apache.org and where to file this bug?
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <
>>> osma.suominen@helsinki.fi>
>>> wrote:
>>>
>>> Hi Anuj,
>>>>
>>>> It's great that we found agreement over this!
>>>>
>>>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled
>>>> and
>>>> made a PR [1] that implements the upgrade up to version 6.4.1 (with
>>>> 5.5.4
>>>> as an intermediate step). I'll wait for comments on the PR and if people
>>>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>>>> already base your ES implementation on that branch [2] if you like.
>>>>
>>>> Could you please open a JIRA issue on issues.apache.org explaining the
>>>> Elasticsearch support feature, so that we have a place for tracking this
>>>> work, request comments etc.
>>>>
>>>> Also I suggest we move the discussion around this to the developers'
>>>> list
>>>> (dev@jena.apache.org) where it's more appropriate.
>>>>
>>>> -Osma
>>>>
>>>> [1] https://github.com/apache/jena/pull/219
>>>>
>>>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>>>
>>>>
>>>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>>>
>>>> I second that. I am now finalising the integration of ES and should have
>>>>> a
>>>>> good production quality implementation ready in a week's time.  At that
>>>>> time I would want you guys to have a look at the implementation and
>>>>> provide
>>>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>>>> code in jena-text module and do a round of testing.
>>>>>
>>>>> Thanks,
>>>>> Anuj Kumar
>>>>>
>>>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>>>
>>>>> I do agree that trying to juggle different versions of Lucene libraries
>>>>>
>>>>>> is
>>>>>> probably not a realistic option right now. Luckily (if I understand
>>>>>> the
>>>>>> conversation thus far correctly) we have a solid alternative; getting
>>>>>> our
>>>>>> current Lucene dependency upgraded should allow us to (eventually)
>>>>>> merge
>>>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>>>> if I
>>>>>> have that wrong! :grin:
>>>>>>
>>>>>> Let me reiterate that this seems like very good work and speaking for
>>>>>> myself, I certainly want to get it included into Jena. It's just a
>>>>>> question
>>>>>> of fitting it in correctly, which might take a bit of time.
>>>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>>
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Hi Anuj!
>>>>>>>
>>>>>>> I have nothing against modularity in general. However, I cannot see
>>>>>>> how
>>>>>>>
>>>>>>> your proposal could work in practice for the Fuseki build, due to the
>>>>>> reasons I mentioned in my previous message (and Adam seemed to
>>>>>> concur).
>>>>>>
>>>>>>
>>>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>>>
>>>>>>> again. If all current Jena modules (ie jena-text and jena-spatial)
>>>>>> were
>>>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>>>> jena-text, right? I think that would be better for everyone than
>>>>>> having
>>>>>> to
>>>>>> maintain your own separate module.
>>>>>>
>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>>>
>>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>>>> structured, as long as I am able to use it :).
>>>>>>>> I have personal preference of doing it in a specific way because
>>>>>>>> IMO,
>>>>>>>>
>>>>>>>> it is
>>>>>>>
>>>>>>
>>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>>
>>>>>>>>
>>>>>>>> again
>>>>>>>
>>>>>>
>>>>>> it may not be the quickest one.
>>>>>>>
>>>>>>>>
>>>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>>>
>>>>>>>> extension
>>>>>>>
>>>>>>
>>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>>>
>>>>>>>> maintaining the ES code extension to Jena Text at-least locally for
>>>>>>>> a
>>>>>>>> coming period of time. I would be more than happy to contribute to
>>>>>>>> Jena
>>>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>>>
>>>>>>>> separate
>>>>>>>
>>>>>>
>>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>>>
>>>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>>>
>>>>>>>> Cheers!
>>>>>>>> Anuj Kumar
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Osma--
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>>>
>>>>>>>>> answer
>>>>>>>>
>>>>>>>
>>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>
>>>>>>>>
>>>>>>>>> problem,
>>>>>>>>
>>>>>>>
>>>>>> at least not without a lot of other change.
>>>>>>>
>>>>>>>>
>>>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>>>
>>>>>>>>> heart
>>>>>>>>
>>>>>>>
>>>>>> of this question, but I must alter your remark just slightly. From
>>>>>>> "the
>>>>>>>
>>>>>>>> Java classloader only sees a single, flat package/class namespace
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>> a set
>>>>>>>>
>>>>>>>
>>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a
>>>>>>> single,
>>>>>>>
>>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>>>
>>>>>>>>> This is the fact that OSGi uses to make it possible to maintain
>>>>>>>>> strict
>>>>>>>>> module boundaries (and even dynamic module relationships at
>>>>>>>>> run-time).
>>>>>>>>>
>>>>>>>>> Each
>>>>>>>>
>>>>>>>
>>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>
>>>>>>>>
>>>>>>>>> for
>>>>>>>>
>>>>>>>
>>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>
>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>
>>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>
>>>>>>>>
>>>>>>>>> to the
>>>>>>>>
>>>>>>>
>>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>
>>>>>>>>
>>>>>>>>> enjoy
>>>>>>>>
>>>>>>>
>>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>>
>>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>>>
>>>>>>>>> _inside_
>>>>>>>>
>>>>>>>
>>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>
>>>>>>>>
>>>>>>>>> hand.)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization
>>>>>>>>> of
>>>>>>>>> this kind, but it's really meant for the JDK itself, not
>>>>>>>>> application
>>>>>>>>> libraries. In theory, we could "roll our own" classloader
>>>>>>>>> management
>>>>>>>>>
>>>>>>>>> for
>>>>>>>>
>>>>>>>
>>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>>
>>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>>>
>>>>>>>>> Otherwise, yes, you get into shading and the like. We have to do
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>> for
>>>>>>>>
>>>>>>>
>>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>
>>>>>>>>
>>>>>>>>> hardly a
>>>>>>>>
>>>>>>>
>>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> A. Soroka
>>>>>>>>> The University of Virginia Library
>>>>>>>>>
>>>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>>>
>>>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi Anuj!
>>>>>>>>>>
>>>>>>>>>> Thanks for the clarification.
>>>>>>>>>>
>>>>>>>>>> However, I'm still not sure I understand the situation
>>>>>>>>>> completely. I
>>>>>>>>>>
>>>>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>>>> convenient ways to structure a Java project. Maven cannot change
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> fact
>>>>>>>>
>>>>>>>
>>>>>> that at runtime, module divisions don't really matter (except that
>>>>>>> they
>>>>>>>
>>>>>>>> usually correspond to package sub-namespaces) and the Java
>>>>>>>>> classloader
>>>>>>>>>
>>>>>>>>> only
>>>>>>>>
>>>>>>>
>>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>
>>>>>>>>
>>>>>>>>> classes
>>>>>>>>
>>>>>>>
>>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>
>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>
>>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>>
>>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>>>
>>>>>>>>> The
>>>>>>>>
>>>>>>>
>>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>
>>>>>>>>
>>>>>>>>> them so
>>>>>>>>
>>>>>>>
>>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>>
>>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays
>>>>>>>>> tries
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>
>>>>>>>
>>>>>> avoid it.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>>>
>>>>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>>>
>>>>>>>>> configuration,
>>>>>>>>
>>>>>>>
>>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>
>>>>>>>>
>>>>>>>>> backends?
>>>>>>>>
>>>>>>>
>>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>
>>>>>>>>
>>>>>>>>> JAR)
>>>>>>>>
>>>>>>>
>>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>
>>>>>>>>
>>>>>>>>> dependencies,
>>>>>>>>
>>>>>>>
>>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>
>>>>>>>>
>>>>>>>>> Lucene
>>>>>>>>
>>>>>>>
>>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>
>>>>>>>>
>>>>>>>>> time,
>>>>>>>>
>>>>>>>
>>>>>> and that the Java classloader, even though it has access to both
>>>>>>>
>>>>>>>>
>>>>>>>>> versions
>>>>>>>>
>>>>>>>
>>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>>
>>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and
>>>>>>>>> "Fuseki-ES"
>>>>>>>>> packages, so that you don't end up with two Lucene versions within
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> same
>>>>>>>>
>>>>>>>
>>>>>> Fuseki JAR?
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>>>
>>>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>>>
>>>>>>>>>> Hi Osma,
>>>>>>>>>>>
>>>>>>>>>>> I understand what you are saying. There are ways to mitigate
>>>>>>>>>>> risks
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>
>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>
>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> not delve into those now. I am not an expert in Jena to
>>>>>>>>>> convincingly
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> say
>>>>>>>>>>
>>>>>>>>>
>>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>
>>>>>>>>
>>>>>>>>>>> say
>>>>>>>>>>
>>>>>>>>>
>>>>>> that it is indeed possible :)
>>>>>>>
>>>>>>>>
>>>>>>>>>>> For the question: "is it even possible to mix modules that depend
>>>>>>>>>>> on
>>>>>>>>>>> different versions of the Lucene libraries within the same
>>>>>>>>>>> project?"
>>>>>>>>>>>
>>>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>>>
>>>>>>>>>>> assume
>>>>>>>>>>
>>>>>>>>>
>>>>>> you
>>>>>>>
>>>>>>>>
>>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> without
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> answer
>>>>>>>>>>
>>>>>>>>>
>>>>>> is
>>>>>>>
>>>>>>>>
>>>>>>>>> yes it is possible and quite simple as well. Let me explain how it
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>> possible. But before that some assumption which I want to call
>>>>>>>>>>> out
>>>>>>>>>>> explicitly.
>>>>>>>>>>>
>>>>>>>>>>> *Assumption:*
>>>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>>>
>>>>>>>>>>> used
>>>>>>>>>>
>>>>>>>>>
>>>>>> for
>>>>>>>
>>>>>>>>
>>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>>>>>>
>>>>>>>>>
>>>>>> will
>>>>>>>
>>>>>>>>
>>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>>>
>>>>>>>>>>> Implementation at any given point in time.
>>>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific
>>>>>>>>>>> classes
>>>>>>>>>>>
>>>>>>>>>>> but
>>>>>>>>>>
>>>>>>>>>
>>>>>> only on jena-text classes, if at all.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>>>
>>>>>>>>>>> contains
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>>>
>>>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in
>>>>>>>>>>> the
>>>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>>>> succeeded.
>>>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>>>
>>>>>>>>>>> beginning
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>>>
>>>>>>>>>>> automatically resolve the dependency issues by including the
>>>>>>>>>>> Lucene
>>>>>>>>>>> librarires that we included in our es specific pom. Have a look
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> pom
>>>>>>>>>>
>>>>>>>>>
>>>>>> of
>>>>>>>
>>>>>>>>
>>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>>>
>>>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/po
>>>>>>>>>>> m.xml
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>>>
>>>>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Anuj,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>>>> between
>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> willing to
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Lucene
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> hitches
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some
>>>>>>>>>> Lucene
>>>>>>>>>>
>>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>>>
>>>>>>>>>>>> A perhaps stupid question to more experienced Java developers:
>>>>>>>>>>>> is
>>>>>>>>>>>> it
>>>>>>>>>>>>
>>>>>>>>>>>> even
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Lucene
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> libraries within the same project? In my (quite limited)
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> understanding
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> of
>>>>>>>
>>>>>>>>
>>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> (e.g.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> code
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> running within the same JVM.
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>>>
>>>>>>>>>>>> jena-text
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> (depending
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Osma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>>>
>>>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>> ES is
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> if
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> we
>>>>>>>
>>>>>>>>
>>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> future I
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> added
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>> months to
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>
>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jena-Text as
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>> technology.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>>>
>>>>>>>>>>>>> independently
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>>
>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>>>
>>>>>>>>>>>>> latest
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>> on
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> Solr
>>>>>>>
>>>>>>>>
>>>>>>>>> Implementation.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>>>> see
>>>>>>>>>>>>>
>>>>>>>>>>>>> how
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> out
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> of
>>>>>>>
>>>>>>>>
>>>>>>>>> Jena Text.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>>>
>>>>>>>>>>>>> experience.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>>>
>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>> %3E
>>>>>>>
>>>>>>>>
>>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> volunteer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> the actual work!
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> doing...
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> work to
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> point.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> :)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure
>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> to
>>>>>>>
>>>>>>>>
>>>>>>>>> get
>>>>>>>>>>
>>>>>>>>>>> it running... If you could just try that with some toy data,
>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> your
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> data
>>>>>>>>>>
>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> anything, so
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>
>>>>>>>>>>> instructions
>>>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Osma
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Osma Suominen
>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>> National Library of Finland
>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>> Tel. +358 50 3199529
>>>>>>> osma.suominen@helsinki.fi
>>>>>>> http://www.nationallibrary.fi
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>>
>>>
>>>
>>> --
>>> *Anuj Kumar*
>>>
>>>
>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj,

Did you see my earlier message to the dev list? Are you subscribed to 
that list? I will Cc: you this time just to be sure. See 
http://jena.markmail.org/thread/uhs6cuhotzj4tjrj for the actual message 
in case you missed it (including some replies).

I see what you mean by deprecating Solr first before removing it, but I 
can't figure out how that would work in practice. If you're right about 
Solr 4.9.1 requiring Lucene 4.9.1, then we can't have Solr and ES 
support in Jena at the same time - unless we upgrade the Solr side as 
well, which seems a bit of a waste of time if you're going to remove it 
anyway.

Like I explained in JENA-1301 there are many problems with the Solr 
implementation and I doubt there are many users, quite possibly nobody 
at all.

In any case switching indexing technologies for jena-text should be 
rather easy, as the text index itself doesn't need to be migrated - it 
can simply be rebuilt from the RDF data. So if someone runs, say, Fuseki 
2.5.0 with a Solr index, then upgrading to (as yet hypothetical) Fuseki 
2.6.0 with an ES index instead is just a matter of setting up ES, 
changing the text index configuration slightly and running 
jena.textindexer (or reloading the data, whichever is easier). There is 
no technical benefit from having support for both Solr and ES in the 
same Jena release as it doesn't make migration any easier, but of 
course, advance warning might help with planning the move to ES.

-Osma


03.03.2017, 16:43, anuj kumar kirjoitti:
> Hey,
>  I just saw https://issues.apache.org/jira/browse/JENA-1301
> Should we not first officially deprecate it and gives any users of Solr a
> chance to move to different Indexing technology?
>
> BTW, I dont know yet how to login to apache JIRA.
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <an...@gmail.com> wrote:
>
>> I Osma,
>>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
>> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
>> 4.9.1
>>
>> Also how do i log into  issues.apache.org and where to file this bug?
>>
>> Thanks,
>> Anuj Kumar
>>
>> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>>
>>> Hi Anuj,
>>>
>>> It's great that we found agreement over this!
>>>
>>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>>> as an intermediate step). I'll wait for comments on the PR and if people
>>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>>> already base your ES implementation on that branch [2] if you like.
>>>
>>> Could you please open a JIRA issue on issues.apache.org explaining the
>>> Elasticsearch support feature, so that we have a place for tracking this
>>> work, request comments etc.
>>>
>>> Also I suggest we move the discussion around this to the developers' list
>>> (dev@jena.apache.org) where it's more appropriate.
>>>
>>> -Osma
>>>
>>> [1] https://github.com/apache/jena/pull/219
>>>
>>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>>
>>>
>>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>>
>>>> I second that. I am now finalising the integration of ES and should have
>>>> a
>>>> good production quality implementation ready in a week's time.  At that
>>>> time I would want you guys to have a look at the implementation and
>>>> provide
>>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>>> code in jena-text module and do a round of testing.
>>>>
>>>> Thanks,
>>>> Anuj Kumar
>>>>
>>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>>
>>>> I do agree that trying to juggle different versions of Lucene libraries
>>>>> is
>>>>> probably not a realistic option right now. Luckily (if I understand the
>>>>> conversation thus far correctly) we have a solid alternative; getting
>>>>> our
>>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>>> if I
>>>>> have that wrong! :grin:
>>>>>
>>>>> Let me reiterate that this seems like very good work and speaking for
>>>>> myself, I certainly want to get it included into Jena. It's just a
>>>>> question
>>>>> of fitting it in correctly, which might take a bit of time.
>>>>>
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>>
>>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Hi Anuj!
>>>>>>
>>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>>
>>>>> your proposal could work in practice for the Fuseki build, due to the
>>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>>
>>>>>>
>>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>>
>>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>>> jena-text, right? I think that would be better for everyone than having
>>>>> to
>>>>> maintain your own separate module.
>>>>>
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>>
>>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>>> structured, as long as I am able to use it :).
>>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>>
>>>>>> it is
>>>>>
>>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>>
>>>>>> again
>>>>>
>>>>>> it may not be the quickest one.
>>>>>>>
>>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>>
>>>>>> extension
>>>>>
>>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>>> coming period of time. I would be more than happy to contribute to
>>>>>>> Jena
>>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>>
>>>>>> separate
>>>>>
>>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>>
>>>>>>> Cheers!
>>>>>>> Anuj Kumar
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>>
>>>>>>> Osma--
>>>>>>>>
>>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>>
>>>>>>> answer
>>>>>
>>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>>
>>>>>>> problem,
>>>>>
>>>>>> at least not without a lot of other change.
>>>>>>>>
>>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>>
>>>>>>> heart
>>>>>
>>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>>
>>>>>>> a set
>>>>>
>>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>>
>>>>>>>> This is the fact that OSGi uses to make it possible to maintain
>>>>>>>> strict
>>>>>>>> module boundaries (and even dynamic module relationships at
>>>>>>>> run-time).
>>>>>>>>
>>>>>>> Each
>>>>>
>>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>>
>>>>>>> for
>>>>>
>>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>>
>>>>>>> the
>>>>>
>>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>>
>>>>>>> to the
>>>>>
>>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>>
>>>>>>> enjoy
>>>>>
>>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>>
>>>>>>> _inside_
>>>>>
>>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>>
>>>>>>> hand.)
>>>>>
>>>>>>
>>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization
>>>>>>>> of
>>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>>
>>>>>>> for
>>>>>
>>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>>
>>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>>
>>>>>>> for
>>>>>
>>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>>
>>>>>>> hardly a
>>>>>
>>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>>
>>>>>>>> ---
>>>>>>>> A. Soroka
>>>>>>>> The University of Virginia Library
>>>>>>>>
>>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>>
>>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suominen@helsinki.fi
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Anuj!
>>>>>>>>>
>>>>>>>>> Thanks for the clarification.
>>>>>>>>>
>>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>>
>>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>>
>>>>>>> fact
>>>>>
>>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>>> usually correspond to package sub-namespaces) and the Java
>>>>>>>> classloader
>>>>>>>>
>>>>>>> only
>>>>>
>>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>>
>>>>>>> classes
>>>>>
>>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>>
>>>>>>> the
>>>>>
>>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>>
>>>>>>> The
>>>>>
>>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>>
>>>>>>> them so
>>>>>
>>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays
>>>>>>>> tries
>>>>>>>>
>>>>>>> to
>>>>>
>>>>>> avoid it.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>>
>>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>>
>>>>>>> configuration,
>>>>>
>>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>>
>>>>>>> backends?
>>>>>
>>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>>
>>>>>>> JAR)
>>>>>
>>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>>
>>>>>>> dependencies,
>>>>>
>>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>>
>>>>>>> Lucene
>>>>>
>>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>>
>>>>>>> time,
>>>>>
>>>>>> and that the Java classloader, even though it has access to both
>>>>>>>>
>>>>>>> versions
>>>>>
>>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and
>>>>>>>> "Fuseki-ES"
>>>>>>>> packages, so that you don't end up with two Lucene versions within
>>>>>>>> the
>>>>>>>>
>>>>>>> same
>>>>>
>>>>>> Fuseki JAR?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Osma
>>>>>>>>>
>>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>>
>>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>>
>>>>>>>>>> Hi Osma,
>>>>>>>>>>
>>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>>
>>>>>>>>> and
>>>>>
>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>>
>>>>>>>>> will
>>>>>>>>
>>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>>
>>>>>>>>> say
>>>>>
>>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>>
>>>>>>>>> say
>>>>>
>>>>>> that it is indeed possible :)
>>>>>>>>>>
>>>>>>>>>> For the question: "is it even possible to mix modules that depend
>>>>>>>>>> on
>>>>>>>>>> different versions of the Lucene libraries within the same
>>>>>>>>>> project?"
>>>>>>>>>>
>>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>>
>>>>>>>>> assume
>>>>>
>>>>>> you
>>>>>>>>
>>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>>
>>>>>>>>> without
>>>>>>>>
>>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>>
>>>>>>>>> answer
>>>>>
>>>>>> is
>>>>>>>>
>>>>>>>>> yes it is possible and quite simple as well. Let me explain how it
>>>>>>>>>> is
>>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>>> explicitly.
>>>>>>>>>>
>>>>>>>>>> *Assumption:*
>>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>>
>>>>>>>>> used
>>>>>
>>>>>> for
>>>>>>>>
>>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>>
>>>>>>>>> we
>>>>>
>>>>>> will
>>>>>>>>
>>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>>> Implementation at any given point in time.
>>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific
>>>>>>>>>> classes
>>>>>>>>>>
>>>>>>>>> but
>>>>>
>>>>>> only on jena-text classes, if at all.
>>>>>>>>>>
>>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>>
>>>>>>>>> contains
>>>>>>>>
>>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in
>>>>>>>>>> the
>>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>>> succeeded.
>>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>>
>>>>>>>>> beginning
>>>>>>>>
>>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>>
>>>>>>>>> pom
>>>>>
>>>>>> of
>>>>>>>>
>>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuj Kumar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>>
>>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Anuj,
>>>>>>>>>>>
>>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>>> between
>>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>
>>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>>
>>>>>>>>>> willing to
>>>>>>>>
>>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>>
>>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>>
>>>>>>>>>> hitches
>>>>>>>>
>>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>>
>>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>>> it
>>>>>>>>>>>
>>>>>>>>>> even
>>>>>>>>
>>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>
>>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>>
>>>>>>>>>> understanding
>>>>>
>>>>>> of
>>>>>>>>
>>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>>
>>>>>>>>>> (e.g.
>>>>>
>>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>>
>>>>>>>>>> code
>>>>>
>>>>>> running within the same JVM.
>>>>>>>>>>>
>>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>>
>>>>>>>>>> jena-text
>>>>>>>>
>>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>>
>>>>>>>>>> (depending
>>>>>>>>
>>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>>
>>>>>>>>>>> -Osma
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>>
>>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>>> and
>>>>>>>>>>>>
>>>>>>>>>>> ES is
>>>>>>>>
>>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>>
>>>>>>>>>>> if
>>>>>
>>>>>> we
>>>>>>>>
>>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>>
>>>>>>>>>>> future I
>>>>>>>>
>>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>>
>>>>>>>>>>> Lucene
>>>>>>>>
>>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>>
>>>>>>>>>>> added
>>>>>
>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>>
>>>>>>>>>>> months to
>>>>>>>>
>>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>>
>>>>>>>>>>> Jena-Text as
>>>>>>>>
>>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>
>>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>>
>>>>>>>>>>> technology.
>>>>>
>>>>>>
>>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>>
>>>>>>>>>>> independently
>>>>>>>>
>>>>>>>>> of
>>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>>
>>>>>>>>>>> latest
>>>>>
>>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>>
>>>>>>>>>>> on
>>>>>
>>>>>> Solr
>>>>>>>>
>>>>>>>>> Implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>>> see
>>>>>>>>>>>>
>>>>>>>>>>> how
>>>>>>>>
>>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>>
>>>>>>>>>>> out
>>>>>
>>>>>> of
>>>>>>>>
>>>>>>>>> Jena Text.
>>>>>>>>>>>>
>>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>>
>>>>>>>>>>> experience.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>>
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>>
>>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> apache.org
>>>>>
>>>>>> %3E
>>>>>>>>
>>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>
>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>>> volunteer
>>>>>>>>>>>>>
>>>>>>>>>>>> to do
>>>>>>>>
>>>>>>>>> the actual work!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>>
>>>>>>>>>>>> doing...
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>>
>>>>>>>>>>>>> work to
>>>>>>>>
>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>>
>>>>>>>>>>>>> point.
>>>>>>>>
>>>>>>>>> :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>>
>>>>>>>>>>>> how
>>>>>
>>>>>> to
>>>>>>>>
>>>>>>>>> get
>>>>>>>>>>>>> it running... If you could just try that with some toy data,
>>>>>>>>>>>>> then
>>>>>>>>>>>>>
>>>>>>>>>>>> your
>>>>>>>>
>>>>>>>>> data
>>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>>
>>>>>>>>>>>> anything, so
>>>>>>>>
>>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>>> instructions
>>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Osma
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Osma Suominen
>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>> National Library of Finland
>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Osma Suominen
>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>> National Library of Finland
>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>>
>>
>>
>>
>> --
>> *Anuj Kumar*
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Hey,
 I just saw https://issues.apache.org/jira/browse/JENA-1301
Should we not first officially deprecate it and gives any users of Solr a
chance to move to different Indexing technology?

BTW, I dont know yet how to login to apache JIRA.

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <an...@gmail.com> wrote:

> I Osma,
>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
> 4.9.1
>
> Also how do i log into  issues.apache.org and where to file this bug?
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> It's great that we found agreement over this!
>>
>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>> as an intermediate step). I'll wait for comments on the PR and if people
>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>> already base your ES implementation on that branch [2] if you like.
>>
>> Could you please open a JIRA issue on issues.apache.org explaining the
>> Elasticsearch support feature, so that we have a place for tracking this
>> work, request comments etc.
>>
>> Also I suggest we move the discussion around this to the developers' list
>> (dev@jena.apache.org) where it's more appropriate.
>>
>> -Osma
>>
>> [1] https://github.com/apache/jena/pull/219
>>
>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>
>>
>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>
>>> I second that. I am now finalising the integration of ES and should have
>>> a
>>> good production quality implementation ready in a week's time.  At that
>>> time I would want you guys to have a look at the implementation and
>>> provide
>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>> code in jena-text module and do a round of testing.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>
>>> I do agree that trying to juggle different versions of Lucene libraries
>>>> is
>>>> probably not a realistic option right now. Luckily (if I understand the
>>>> conversation thus far correctly) we have a solid alternative; getting
>>>> our
>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>> if I
>>>> have that wrong! :grin:
>>>>
>>>> Let me reiterate that this seems like very good work and speaking for
>>>> myself, I certainly want to get it included into Jena. It's just a
>>>> question
>>>> of fitting it in correctly, which might take a bit of time.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>
>>>> your proposal could work in practice for the Fuseki build, due to the
>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>
>>>>>
>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>
>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>> jena-text, right? I think that would be better for everyone than having
>>>> to
>>>> maintain your own separate module.
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>
>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>> structured, as long as I am able to use it :).
>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>
>>>>> it is
>>>>
>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>
>>>>> again
>>>>
>>>>> it may not be the quickest one.
>>>>>>
>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>
>>>>> extension
>>>>
>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>> coming period of time. I would be more than happy to contribute to
>>>>>> Jena
>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>
>>>>> separate
>>>>
>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>
>>>>>> Cheers!
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>
>>>>>> Osma--
>>>>>>>
>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>
>>>>>> answer
>>>>
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>
>>>>>> problem,
>>>>
>>>>> at least not without a lot of other change.
>>>>>>>
>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>
>>>>>> heart
>>>>
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>
>>>>>> a set
>>>>
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>
>>>>>>> This is the fact that OSGi uses to make it possible to maintain
>>>>>>> strict
>>>>>>> module boundaries (and even dynamic module relationships at
>>>>>>> run-time).
>>>>>>>
>>>>>> Each
>>>>
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>
>>>>>> for
>>>>
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>
>>>>>> the
>>>>
>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>
>>>>>> to the
>>>>
>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>
>>>>>> enjoy
>>>>
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>
>>>>>> _inside_
>>>>
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>
>>>>>> hand.)
>>>>
>>>>>
>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization
>>>>>>> of
>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>
>>>>>> for
>>>>
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>
>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>
>>>>>> for
>>>>
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>
>>>>>> hardly a
>>>>
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>
>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suominen@helsinki.fi
>>>>>>>> >
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Anuj!
>>>>>>>>
>>>>>>>> Thanks for the clarification.
>>>>>>>>
>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>
>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>
>>>>>> fact
>>>>
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>> usually correspond to package sub-namespaces) and the Java
>>>>>>> classloader
>>>>>>>
>>>>>> only
>>>>
>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>
>>>>>> classes
>>>>
>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>
>>>>>> the
>>>>
>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>
>>>>>> The
>>>>
>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>
>>>>>> them so
>>>>
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays
>>>>>>> tries
>>>>>>>
>>>>>> to
>>>>
>>>>> avoid it.
>>>>>>>
>>>>>>>>
>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>
>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>
>>>>>> configuration,
>>>>
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>
>>>>>> backends?
>>>>
>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>
>>>>>> JAR)
>>>>
>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>
>>>>>> dependencies,
>>>>
>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>
>>>>>> time,
>>>>
>>>>> and that the Java classloader, even though it has access to both
>>>>>>>
>>>>>> versions
>>>>
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and
>>>>>>> "Fuseki-ES"
>>>>>>> packages, so that you don't end up with two Lucene versions within
>>>>>>> the
>>>>>>>
>>>>>> same
>>>>
>>>>> Fuseki JAR?
>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>
>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>>> Hi Osma,
>>>>>>>>>
>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>
>>>>>>>> and
>>>>
>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>
>>>>>>>> will
>>>>>>>
>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is indeed possible :)
>>>>>>>>>
>>>>>>>>> For the question: "is it even possible to mix modules that depend
>>>>>>>>> on
>>>>>>>>> different versions of the Lucene libraries within the same
>>>>>>>>> project?"
>>>>>>>>>
>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>
>>>>>>>> assume
>>>>
>>>>> you
>>>>>>>
>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>
>>>>>>>> without
>>>>>>>
>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>
>>>>>>>> answer
>>>>
>>>>> is
>>>>>>>
>>>>>>>> yes it is possible and quite simple as well. Let me explain how it
>>>>>>>>> is
>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>> explicitly.
>>>>>>>>>
>>>>>>>>> *Assumption:*
>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>
>>>>>>>> used
>>>>
>>>>> for
>>>>>>>
>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>
>>>>>>>> we
>>>>
>>>>> will
>>>>>>>
>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>> Implementation at any given point in time.
>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific
>>>>>>>>> classes
>>>>>>>>>
>>>>>>>> but
>>>>
>>>>> only on jena-text classes, if at all.
>>>>>>>>>
>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>
>>>>>>>> contains
>>>>>>>
>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in
>>>>>>>>> the
>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>> succeeded.
>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>
>>>>>>>> beginning
>>>>>>>
>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>
>>>>>>>> pom
>>>>
>>>>> of
>>>>>>>
>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>
>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Anuj,
>>>>>>>>>>
>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>> between
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>
>>>>>>>>> willing to
>>>>>>>
>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>
>>>>>>>>> hitches
>>>>>>>
>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>
>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> even
>>>>>>>
>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>
>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>
>>>>>>>>> understanding
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>
>>>>>>>>> (e.g.
>>>>
>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>
>>>>>>>>> code
>>>>
>>>>> running within the same JVM.
>>>>>>>>>>
>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>
>>>>>>>>> jena-text
>>>>>>>
>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>
>>>>>>>>> (depending
>>>>>>>
>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>
>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> ES is
>>>>>>>
>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>
>>>>>>>>>> if
>>>>
>>>>> we
>>>>>>>
>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>
>>>>>>>>>> future I
>>>>>>>
>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>
>>>>>>>>>> added
>>>>
>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>
>>>>>>>>>> months to
>>>>>>>
>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>
>>>>>>>>>> Jena-Text as
>>>>>>>
>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>
>>>>>>>>>> technology.
>>>>
>>>>>
>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>
>>>>>>>>>> independently
>>>>>>>
>>>>>>>> of
>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>
>>>>>>>>>> latest
>>>>
>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>
>>>>>>>>>> on
>>>>
>>>>> Solr
>>>>>>>
>>>>>>>> Implementation.
>>>>>>>>>>>
>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>> see
>>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>
>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>
>>>>>>>>>> out
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Jena Text.
>>>>>>>>>>>
>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>
>>>>>>>>>> experience.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>
>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>
>>>>>>>>>>>> apache.org
>>>>
>>>>> %3E
>>>>>>>
>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>> volunteer
>>>>>>>>>>>>
>>>>>>>>>>> to do
>>>>>>>
>>>>>>>> the actual work!
>>>>>>>>>>>>
>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>
>>>>>>>>>>> doing...
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>
>>>>>>>>>>>> work to
>>>>>>>
>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>
>>>>>>>>>>>> point.
>>>>>>>
>>>>>>>> :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>
>>>>>>>>>>> how
>>>>
>>>>> to
>>>>>>>
>>>>>>>> get
>>>>>>>>>>>> it running... If you could just try that with some toy data,
>>>>>>>>>>>> then
>>>>>>>>>>>>
>>>>>>>>>>> your
>>>>>>>
>>>>>>>> data
>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>
>>>>>>>>>>> anything, so
>>>>>>>
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>> instructions
>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Osma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Hey,
 I just saw https://issues.apache.org/jira/browse/JENA-1301
Should we not first officially deprecate it and gives any users of Solr a
chance to move to different Indexing technology?

BTW, I dont know yet how to login to apache JIRA.

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <an...@gmail.com> wrote:

> I Osma,
>  I briefly looked at the pull request. I beieve we need to upgrade Lucene
> and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
> 4.9.1
>
> Also how do i log into  issues.apache.org and where to file this bug?
>
> Thanks,
> Anuj Kumar
>
> On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> It's great that we found agreement over this!
>>
>> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
>> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
>> as an intermediate step). I'll wait for comments on the PR and if people
>> think it's OK I will merge it soon to Jena master. Meanwhile, you can
>> already base your ES implementation on that branch [2] if you like.
>>
>> Could you please open a JIRA issue on issues.apache.org explaining the
>> Elasticsearch support feature, so that we have a place for tracking this
>> work, request comments etc.
>>
>> Also I suggest we move the discussion around this to the developers' list
>> (dev@jena.apache.org) where it's more appropriate.
>>
>> -Osma
>>
>> [1] https://github.com/apache/jena/pull/219
>>
>> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>>
>>
>> 03.03.2017, 02:45, anuj kumar kirjoitti:
>>
>>> I second that. I am now finalising the integration of ES and should have
>>> a
>>> good production quality implementation ready in a week's time.  At that
>>> time I would want you guys to have a look at the implementation and
>>> provide
>>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>>> code in jena-text module and do a round of testing.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>>
>>> I do agree that trying to juggle different versions of Lucene libraries
>>>> is
>>>> probably not a realistic option right now. Luckily (if I understand the
>>>> conversation thus far correctly) we have a solid alternative; getting
>>>> our
>>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>>> Anuj's work into the mainstream of development. Someone please tell me
>>>> if I
>>>> have that wrong! :grin:
>>>>
>>>> Let me reiterate that this seems like very good work and speaking for
>>>> myself, I certainly want to get it included into Jena. It's just a
>>>> question
>>>> of fitting it in correctly, which might take a bit of time.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Anuj!
>>>>>
>>>>> I have nothing against modularity in general. However, I cannot see how
>>>>>
>>>> your proposal could work in practice for the Fuseki build, due to the
>>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>>
>>>>>
>>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>>
>>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>>> jena-text, right? I think that would be better for everyone than having
>>>> to
>>>> maintain your own separate module.
>>>>
>>>>>
>>>>> -Osma
>>>>>
>>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>>
>>>>>> I personally have no preference as to how the code in Jena should be
>>>>>> structured, as long as I am able to use it :).
>>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>>
>>>>> it is
>>>>
>>>>> modular which makes it much easier to maintain in the long run. But
>>>>>>
>>>>> again
>>>>
>>>>> it may not be the quickest one.
>>>>>>
>>>>>> I already have been given a deadline, by the company to have ES
>>>>>>
>>>>> extension
>>>>
>>>>> implemented in the next 15 days :). What this means is that I will be
>>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>>> coming period of time. I would be more than happy to contribute to
>>>>>> Jena
>>>>>> community whatever is required to have a proper ElasticSearch
>>>>>> Implementation in place, whether within jena-text module or as a
>>>>>>
>>>>> separate
>>>>
>>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>>
>>>>>> Cheers!
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>>
>>>>>> Osma--
>>>>>>>
>>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>>> different versions of code accessible in different ways. The longer
>>>>>>>
>>>>>> answer
>>>>
>>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>>
>>>>>> problem,
>>>>
>>>>> at least not without a lot of other change.
>>>>>>>
>>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>>
>>>>>> heart
>>>>
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>>
>>>>>> a set
>>>>
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>>
>>>>>>> This is the fact that OSGi uses to make it possible to maintain
>>>>>>> strict
>>>>>>> module boundaries (and even dynamic module relationships at
>>>>>>> run-time).
>>>>>>>
>>>>>> Each
>>>>
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>>
>>>>>> for
>>>>
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>>
>>>>>> the
>>>>
>>>>> way of types to function, based on metadata that the bundles provide
>>>>>>>
>>>>>> to the
>>>>
>>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>>
>>>>>> enjoy
>>>>
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>>
>>>>>> _inside_
>>>>
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>>
>>>>>> hand.)
>>>>
>>>>>
>>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization
>>>>>>> of
>>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>>
>>>>>> for
>>>>
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>>> purpose, but I'm not aware of any myself.
>>>>>>>
>>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>>
>>>>>> for
>>>>
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>>
>>>>>> hardly a
>>>>
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>>
>>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suominen@helsinki.fi
>>>>>>>> >
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Anuj!
>>>>>>>>
>>>>>>>> Thanks for the clarification.
>>>>>>>>
>>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>>
>>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>>
>>>>>> fact
>>>>
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>>>> usually correspond to package sub-namespaces) and the Java
>>>>>>> classloader
>>>>>>>
>>>>>> only
>>>>
>>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>>
>>>>>> classes
>>>>
>>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>>
>>>>>> the
>>>>
>>>>> right classes, and if there are two versions of the same library (eg
>>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>>
>>>>>> The
>>>>
>>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>>
>>>>>> them so
>>>>
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>>> Elasticsearch also did some of that in the past [1] but nowadays
>>>>>>> tries
>>>>>>>
>>>>>> to
>>>>
>>>>> avoid it.
>>>>>>>
>>>>>>>>
>>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>>
>>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>>
>>>>>> configuration,
>>>>
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>>
>>>>>> backends?
>>>>
>>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>>
>>>>>> JAR)
>>>>
>>>>> both the jena-text and jena-text-es modules with all their
>>>>>>>
>>>>>> dependencies,
>>>>
>>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>>
>>>>>> Lucene
>>>>
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>>
>>>>>> time,
>>>>
>>>>> and that the Java classloader, even though it has access to both
>>>>>>>
>>>>>> versions
>>>>
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and
>>>>>>> "Fuseki-ES"
>>>>>>> packages, so that you don't end up with two Lucene versions within
>>>>>>> the
>>>>>>>
>>>>>> same
>>>>
>>>>> Fuseki JAR?
>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>>
>>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>>> Hi Osma,
>>>>>>>>>
>>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>>
>>>>>>>> and
>>>>
>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>>
>>>>>>>> will
>>>>>>>
>>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>>
>>>>>>>> say
>>>>
>>>>> that it is indeed possible :)
>>>>>>>>>
>>>>>>>>> For the question: "is it even possible to mix modules that depend
>>>>>>>>> on
>>>>>>>>> different versions of the Lucene libraries within the same
>>>>>>>>> project?"
>>>>>>>>>
>>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>>
>>>>>>>> assume
>>>>
>>>>> you
>>>>>>>
>>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>>
>>>>>>>> without
>>>>>>>
>>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>>
>>>>>>>> answer
>>>>
>>>>> is
>>>>>>>
>>>>>>>> yes it is possible and quite simple as well. Let me explain how it
>>>>>>>>> is
>>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>>> explicitly.
>>>>>>>>>
>>>>>>>>> *Assumption:*
>>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>>
>>>>>>>> used
>>>>
>>>>> for
>>>>>>>
>>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>>
>>>>>>>> we
>>>>
>>>>> will
>>>>>>>
>>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>>> Implementation at any given point in time.
>>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific
>>>>>>>>> classes
>>>>>>>>>
>>>>>>>> but
>>>>
>>>>> only on jena-text classes, if at all.
>>>>>>>>>
>>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>>
>>>>>>>> contains
>>>>>>>
>>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>>> compatibility issues. And it is infact quite simple. I did it in
>>>>>>>>> the
>>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>>> succeeded.
>>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>>
>>>>>>>> beginning
>>>>>>>
>>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>>
>>>>>>>> pom
>>>>
>>>>> of
>>>>>>>
>>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>>
>>>>>>>> osma.suominen@helsinki.fi>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Anuj,
>>>>>>>>>>
>>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>>> between
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>>
>>>>>>>>> willing to
>>>>>>>
>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>>
>>>>>>>>> hitches
>>>>>>>
>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>>
>>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>> even
>>>>>>>
>>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>
>>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>>
>>>>>>>>> understanding
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>>
>>>>>>>>> (e.g.
>>>>
>>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>>
>>>>>>>>> code
>>>>
>>>>> running within the same JVM.
>>>>>>>>>>
>>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>>
>>>>>>>>> jena-text
>>>>>>>
>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>>
>>>>>>>>> (depending
>>>>>>>
>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> My 2 Cents :
>>>>>>>>>>>
>>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> ES is
>>>>>>>
>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>>
>>>>>>>>>> if
>>>>
>>>>> we
>>>>>>>
>>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>>
>>>>>>>>>> future I
>>>>>>>
>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>>
>>>>>>>>>> Lucene
>>>>>>>
>>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>>
>>>>>>>>>> added
>>>>
>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>>
>>>>>>>>>> months to
>>>>>>>
>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>>
>>>>>>>>>> Jena-Text as
>>>>>>>
>>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>>
>>>>>>>>>> technology.
>>>>
>>>>>
>>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>>
>>>>>>>>>> independently
>>>>>>>
>>>>>>>> of
>>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>>
>>>>>>>>>> latest
>>>>
>>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>>
>>>>>>>>>> on
>>>>
>>>>> Solr
>>>>>>>
>>>>>>>> Implementation.
>>>>>>>>>>>
>>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>>> see
>>>>>>>>>>>
>>>>>>>>>> how
>>>>>>>
>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>>
>>>>>>>>>> out
>>>>
>>>>> of
>>>>>>>
>>>>>>>> Jena Text.
>>>>>>>>>>>
>>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>>
>>>>>>>>>> experience.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anuj Kumar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>>
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>>
>>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>>
>>>>>>>>>>>> apache.org
>>>>
>>>>> %3E
>>>>>>>
>>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>>> volunteer
>>>>>>>>>>>>
>>>>>>>>>>> to do
>>>>>>>
>>>>>>>> the actual work!
>>>>>>>>>>>>
>>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>>
>>>>>>>>>>> doing...
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>>
>>>>>>>>>>>> work to
>>>>>>>
>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>>
>>>>>>>>>>>> point.
>>>>>>>
>>>>>>>> :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>>
>>>>>>>>>>> how
>>>>
>>>>> to
>>>>>>>
>>>>>>>> get
>>>>>>>>>>>> it running... If you could just try that with some toy data,
>>>>>>>>>>>> then
>>>>>>>>>>>>
>>>>>>>>>>> your
>>>>>>>
>>>>>>>> data
>>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>>
>>>>>>>>>>> anything, so
>>>>>>>
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>>> instructions
>>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Osma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Osma Suominen
>>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>>> National Library of Finland
>>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
I Osma,
 I briefly looked at the pull request. I beieve we need to upgrade Lucene
and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
4.9.1

Also how do i log into  issues.apache.org and where to file this bug?

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
wrote:

> Hi Anuj,
>
> It's great that we found agreement over this!
>
> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
> as an intermediate step). I'll wait for comments on the PR and if people
> think it's OK I will merge it soon to Jena master. Meanwhile, you can
> already base your ES implementation on that branch [2] if you like.
>
> Could you please open a JIRA issue on issues.apache.org explaining the
> Elasticsearch support feature, so that we have a place for tracking this
> work, request comments etc.
>
> Also I suggest we move the discussion around this to the developers' list (
> dev@jena.apache.org) where it's more appropriate.
>
> -Osma
>
> [1] https://github.com/apache/jena/pull/219
>
> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>
>
> 03.03.2017, 02:45, anuj kumar kirjoitti:
>
>> I second that. I am now finalising the integration of ES and should have a
>> good production quality implementation ready in a week's time.  At that
>> time I would want you guys to have a look at the implementation and
>> provide
>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>> code in jena-text module and do a round of testing.
>>
>> Thanks,
>> Anuj Kumar
>>
>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>
>> I do agree that trying to juggle different versions of Lucene libraries is
>>> probably not a realistic option right now. Luckily (if I understand the
>>> conversation thus far correctly) we have a solid alternative; getting our
>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>> Anuj's work into the mainstream of development. Someone please tell me
>>> if I
>>> have that wrong! :grin:
>>>
>>> Let me reiterate that this seems like very good work and speaking for
>>> myself, I certainly want to get it included into Jena. It's just a
>>> question
>>> of fitting it in correctly, which might take a bit of time.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>
>>> wrote:
>>>
>>>>
>>>> Hi Anuj!
>>>>
>>>> I have nothing against modularity in general. However, I cannot see how
>>>>
>>> your proposal could work in practice for the Fuseki build, due to the
>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>
>>>>
>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>
>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>> jena-text, right? I think that would be better for everyone than having
>>> to
>>> maintain your own separate module.
>>>
>>>>
>>>> -Osma
>>>>
>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>
>>>>> I personally have no preference as to how the code in Jena should be
>>>>> structured, as long as I am able to use it :).
>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>
>>>> it is
>>>
>>>> modular which makes it much easier to maintain in the long run. But
>>>>>
>>>> again
>>>
>>>> it may not be the quickest one.
>>>>>
>>>>> I already have been given a deadline, by the company to have ES
>>>>>
>>>> extension
>>>
>>>> implemented in the next 15 days :). What this means is that I will be
>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>> community whatever is required to have a proper ElasticSearch
>>>>> Implementation in place, whether within jena-text module or as a
>>>>>
>>>> separate
>>>
>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>
>>>>> Cheers!
>>>>> Anuj Kumar
>>>>>
>>>>>
>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>
>>>>> Osma--
>>>>>>
>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>> different versions of code accessible in different ways. The longer
>>>>>>
>>>>> answer
>>>
>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>
>>>>> problem,
>>>
>>>> at least not without a lot of other change.
>>>>>>
>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>
>>>>> heart
>>>
>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>
>>>>> a set
>>>
>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>
>>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>>>
>>>>> Each
>>>
>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>
>>>>> for
>>>
>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>
>>>>> the
>>>
>>>> way of types to function, based on metadata that the bundles provide
>>>>>>
>>>>> to the
>>>
>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>
>>>>> enjoy
>>>
>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>
>>>>> _inside_
>>>
>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>
>>>>> hand.)
>>>
>>>>
>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>
>>>>> for
>>>
>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>> purpose, but I'm not aware of any myself.
>>>>>>
>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>
>>>>> for
>>>
>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>
>>>>> hardly a
>>>
>>>> thing we want to do any more of than needed, I don't think.
>>>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>
>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Anuj!
>>>>>>>
>>>>>>> Thanks for the clarification.
>>>>>>>
>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>
>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>
>>>>> fact
>>>
>>>> that at runtime, module divisions don't really matter (except that they
>>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>>>
>>>>> only
>>>
>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>
>>>>> classes
>>>
>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>
>>>>> the
>>>
>>>> right classes, and if there are two versions of the same library (eg
>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>
>>>>> The
>>>
>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>
>>>>> them so
>>>
>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>>>
>>>>> to
>>>
>>>> avoid it.
>>>>>>
>>>>>>>
>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>
>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>
>>>>> configuration,
>>>
>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>
>>>>> backends?
>>>
>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>
>>>>> JAR)
>>>
>>>> both the jena-text and jena-text-es modules with all their
>>>>>>
>>>>> dependencies,
>>>
>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>
>>>>> Lucene
>>>
>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>
>>>>> time,
>>>
>>>> and that the Java classloader, even though it has access to both
>>>>>>
>>>>> versions
>>>
>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>>>
>>>>> same
>>>
>>>> Fuseki JAR?
>>>>>>
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>
>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>
>>>>>>>> Hi Osma,
>>>>>>>>
>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>
>>>>>>> and
>>>
>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>
>>>>>>> will
>>>>>>
>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>
>>>>>>> say
>>>
>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>
>>>>>>> say
>>>
>>>> that it is indeed possible :)
>>>>>>>>
>>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>>
>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>
>>>>>>> assume
>>>
>>>> you
>>>>>>
>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>
>>>>>>> without
>>>>>>
>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>
>>>>>>> answer
>>>
>>>> is
>>>>>>
>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>> explicitly.
>>>>>>>>
>>>>>>>> *Assumption:*
>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>
>>>>>>> used
>>>
>>>> for
>>>>>>
>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>
>>>>>>> we
>>>
>>>> will
>>>>>>
>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>> Implementation at any given point in time.
>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>>
>>>>>>> but
>>>
>>>> only on jena-text classes, if at all.
>>>>>>>>
>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>
>>>>>>> contains
>>>>>>
>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>> succeeded.
>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>
>>>>>>> beginning
>>>>>>
>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>
>>>>>>> pom
>>>
>>>> of
>>>>>>
>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anuj Kumar
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>
>>>>>>> osma.suominen@helsinki.fi>
>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Anuj,
>>>>>>>>>
>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>> between
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>
>>>>>>>> willing to
>>>>>>
>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>
>>>>>>>> Lucene
>>>>>>
>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>
>>>>>>>> hitches
>>>>>>
>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>
>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>> it
>>>>>>>>>
>>>>>>>> even
>>>>>>
>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>
>>>>>>>> Lucene
>>>
>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>
>>>>>>>> understanding
>>>
>>>> of
>>>>>>
>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>
>>>>>>>> (e.g.
>>>
>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>
>>>>>>>> code
>>>
>>>> running within the same JVM.
>>>>>>>>>
>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>
>>>>>>>> jena-text
>>>>>>
>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>
>>>>>>>> (depending
>>>>>>
>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>
>>>>>>>>> -Osma
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> My 2 Cents :
>>>>>>>>>>
>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>> ES is
>>>>>>
>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>
>>>>>>>>> if
>>>
>>>> we
>>>>>>
>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>
>>>>>>>>> future I
>>>>>>
>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>
>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>
>>>>>>>>> added
>>>
>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>
>>>>>>>>> months to
>>>>>>
>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>
>>>>>>>>> Jena-Text as
>>>>>>
>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>
>>>>>>>>> the
>>>
>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>
>>>>>>>>> technology.
>>>
>>>>
>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>
>>>>>>>>> independently
>>>>>>
>>>>>>> of
>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>
>>>>>>>>> latest
>>>
>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>
>>>>>>>>> on
>>>
>>>> Solr
>>>>>>
>>>>>>> Implementation.
>>>>>>>>>>
>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>> see
>>>>>>>>>>
>>>>>>>>> how
>>>>>>
>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>
>>>>>>>>> out
>>>
>>>> of
>>>>>>
>>>>>>> Jena Text.
>>>>>>>>>>
>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>
>>>>>>>>> experience.
>>>>>>
>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuj Kumar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>
>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>
>>>>>>>>>>> apache.org
>>>
>>>> %3E
>>>>>>
>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>
>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>> volunteer
>>>>>>>>>>>
>>>>>>>>>> to do
>>>>>>
>>>>>>> the actual work!
>>>>>>>>>>>
>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>
>>>>>>>>>> doing...
>>>>>>
>>>>>>>
>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>
>>>>>>>>>>> work to
>>>>>>
>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>
>>>>>>>>>>> point.
>>>>>>
>>>>>>> :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>
>>>>>>>>>> how
>>>
>>>> to
>>>>>>
>>>>>>> get
>>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>>
>>>>>>>>>> your
>>>>>>
>>>>>>> data
>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>
>>>>>>>>>> anything, so
>>>>>>
>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>> instructions
>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Osma
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Osma Suominen
>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>> National Library of Finland
>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Osma Suominen
>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>> National Library of Finland
>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Osma Suominen
>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>> National Library of Finland
>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>> Tel. +358 50 3199529
>>>>>>> osma.suominen@helsinki.fi
>>>>>>> http://www.nationallibrary.fi
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>
>>>
>>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
I Osma,
 I briefly looked at the pull request. I beieve we need to upgrade Lucene
and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
4.9.1

Also how do i log into  issues.apache.org and where to file this bug?

Thanks,
Anuj Kumar

On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <os...@helsinki.fi>
wrote:

> Hi Anuj,
>
> It's great that we found agreement over this!
>
> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
> as an intermediate step). I'll wait for comments on the PR and if people
> think it's OK I will merge it soon to Jena master. Meanwhile, you can
> already base your ES implementation on that branch [2] if you like.
>
> Could you please open a JIRA issue on issues.apache.org explaining the
> Elasticsearch support feature, so that we have a place for tracking this
> work, request comments etc.
>
> Also I suggest we move the discussion around this to the developers' list (
> dev@jena.apache.org) where it's more appropriate.
>
> -Osma
>
> [1] https://github.com/apache/jena/pull/219
>
> [2] https://github.com/osma/jena/tree/jena-1250-lucene6
>
>
> 03.03.2017, 02:45, anuj kumar kirjoitti:
>
>> I second that. I am now finalising the integration of ES and should have a
>> good production quality implementation ready in a week's time.  At that
>> time I would want you guys to have a look at the implementation and
>> provide
>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
>> code in jena-text module and do a round of testing.
>>
>> Thanks,
>> Anuj Kumar
>>
>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>>
>> I do agree that trying to juggle different versions of Lucene libraries is
>>> probably not a realistic option right now. Luckily (if I understand the
>>> conversation thus far correctly) we have a solid alternative; getting our
>>> current Lucene dependency upgraded should allow us to (eventually) merge
>>> Anuj's work into the mainstream of development. Someone please tell me
>>> if I
>>> have that wrong! :grin:
>>>
>>> Let me reiterate that this seems like very good work and speaking for
>>> myself, I certainly want to get it included into Jena. It's just a
>>> question
>>> of fitting it in correctly, which might take a bit of time.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>>>>
>>> wrote:
>>>
>>>>
>>>> Hi Anuj!
>>>>
>>>> I have nothing against modularity in general. However, I cannot see how
>>>>
>>> your proposal could work in practice for the Fuseki build, due to the
>>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>
>>>>
>>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>>>>
>>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>>> jena-text, right? I think that would be better for everyone than having
>>> to
>>> maintain your own separate module.
>>>
>>>>
>>>> -Osma
>>>>
>>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>>
>>>>> I personally have no preference as to how the code in Jena should be
>>>>> structured, as long as I am able to use it :).
>>>>> I have personal preference of doing it in a specific way because IMO,
>>>>>
>>>> it is
>>>
>>>> modular which makes it much easier to maintain in the long run. But
>>>>>
>>>> again
>>>
>>>> it may not be the quickest one.
>>>>>
>>>>> I already have been given a deadline, by the company to have ES
>>>>>
>>>> extension
>>>
>>>> implemented in the next 15 days :). What this means is that I will be
>>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>>> coming period of time. I would be more than happy to contribute to Jena
>>>>> community whatever is required to have a proper ElasticSearch
>>>>> Implementation in place, whether within jena-text module or as a
>>>>>
>>>> separate
>>>
>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>>
>>>>> Cheers!
>>>>> Anuj Kumar
>>>>>
>>>>>
>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>>
>>>>> Osma--
>>>>>>
>>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>>> different versions of code accessible in different ways. The longer
>>>>>>
>>>>> answer
>>>
>>>> is that it's probably not a viable alternative for Jena for this
>>>>>>
>>>>> problem,
>>>
>>>> at least not without a lot of other change.
>>>>>>
>>>>>> You are right to point to the classloader mechanism as being at the
>>>>>>
>>>>> heart
>>>
>>>> of this question, but I must alter your remark just slightly. From "the
>>>>>> Java classloader only sees a single, flat package/class namespace and
>>>>>>
>>>>> a set
>>>
>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>>> flat package/class namespace and a set of compiled classes".
>>>>>>
>>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>>> module boundaries (and even dynamic module relationships at run-time).
>>>>>>
>>>>> Each
>>>
>>>> OSGi bundle sees its own classloader, and the framework is responsible
>>>>>>
>>>>> for
>>>
>>>> connecting bundles up to ensure that every bundle has what it needs in
>>>>>>
>>>>> the
>>>
>>>> way of types to function, based on metadata that the bundles provide
>>>>>>
>>>>> to the
>>>
>>>> framework. It's an incredibly powerful system (I use it every day and
>>>>>>
>>>>> enjoy
>>>
>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>>> investment to use. In particular, it's probably too large to put
>>>>>>
>>>>> _inside_
>>>
>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>>>>>>
>>>>> hand.)
>>>
>>>>
>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>>> libraries. In theory, we could "roll our own" classloader management
>>>>>>
>>>>> for
>>>
>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>>> There might be another, more lightweight, toolkit out there to this
>>>>>> purpose, but I'm not aware of any myself.
>>>>>>
>>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>>>>>>
>>>>> for
>>>
>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>>>>>>
>>>>> hardly a
>>>
>>>> thing we want to do any more of than needed, I don't think.
>>>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>>
>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Anuj!
>>>>>>>
>>>>>>> Thanks for the clarification.
>>>>>>>
>>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>>>>
>>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>>> convenient ways to structure a Java project. Maven cannot change the
>>>>>>
>>>>> fact
>>>
>>>> that at runtime, module divisions don't really matter (except that they
>>>>>> usually correspond to package sub-namespaces) and the Java classloader
>>>>>>
>>>>> only
>>>
>>>> sees a single, flat package/class namespace and a set of compiled
>>>>>>
>>>>> classes
>>>
>>>> (usually within JARs) in the classpath that it needs to check to find
>>>>>>
>>>>> the
>>>
>>>> right classes, and if there are two versions of the same library (eg
>>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>>>>>>
>>>>> The
>>>
>>>> only way around that is to shade some of the libraries, i.e. rename
>>>>>>
>>>>> them so
>>>
>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>>>>>>
>>>>> to
>>>
>>>> avoid it.
>>>>>>
>>>>>>>
>>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>>>>
>>>>>> Indexing Technology is used") imply that in the assembler
>>>>>>
>>>>> configuration,
>>>
>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>>>>>
>>>>> backends?
>>>
>>>> Or how do you run something like Fuseki that contains (in a single big
>>>>>>
>>>>> JAR)
>>>
>>>> both the jena-text and jena-text-es modules with all their
>>>>>>
>>>>> dependencies,
>>>
>>>> one of which requires the Lucene 4.x classes and the other one the
>>>>>>
>>>>> Lucene
>>>
>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>>>>>>
>>>>> time,
>>>
>>>> and that the Java classloader, even though it has access to both
>>>>>>
>>>>> versions
>>>
>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>>> packages, so that you don't end up with two Lucene versions within the
>>>>>>
>>>>> same
>>>
>>>> Fuseki JAR?
>>>>>>
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>>
>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>>
>>>>>>>> Hi Osma,
>>>>>>>>
>>>>>>>> I understand what you are saying. There are ways to mitigate risks
>>>>>>>>
>>>>>>> and
>>>
>>>> balance the refactoring without affecting the existing modules. But I
>>>>>>>>
>>>>>>> will
>>>>>>
>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>>>>>>>>
>>>>>>> say
>>>
>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>>>>>
>>>>>>> say
>>>
>>>> that it is indeed possible :)
>>>>>>>>
>>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>>
>>>>>>>> I actually do not understand what you mean by mixing modules. I
>>>>>>>>
>>>>>>> assume
>>>
>>>> you
>>>>>>
>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>>>>
>>>>>>> without
>>>>>>
>>>>>>> causing the build to conflict. If that is what you mean than the
>>>>>>>>
>>>>>>> answer
>>>
>>>> is
>>>>>>
>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>>> explicitly.
>>>>>>>>
>>>>>>>> *Assumption:*
>>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>>>>>>>>
>>>>>>> used
>>>
>>>> for
>>>>>>
>>>>>>> text based indexing and searching via Jean. What this means is that
>>>>>>>>
>>>>>>> we
>>>
>>>> will
>>>>>>
>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>>> Implementation at any given point in time.
>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>>>>>
>>>>>>> but
>>>
>>>> only on jena-text classes, if at all.
>>>>>>>>
>>>>>>>> Based on these assumptions it is possible to create a build that
>>>>>>>>
>>>>>>> contains
>>>>>>
>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>>> current jena-text-es module and ran the entire build which
>>>>>>>> succeeded.
>>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>>>>
>>>>>>> beginning
>>>>>>
>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>>> librarires that we included in our es specific pom. Have a look the
>>>>>>>>
>>>>>>> pom
>>>
>>>> of
>>>>>>
>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anuj Kumar
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>>>>
>>>>>>> osma.suominen@helsinki.fi>
>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Anuj,
>>>>>>>>>
>>>>>>>>> I understand your concerns. However, we also need to balance
>>>>>>>>> between
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>>>>
>>>>>>>> willing to
>>>>>>
>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>>>>>>
>>>>>>>> Lucene
>>>>>>
>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>>>>
>>>>>>>> hitches
>>>>>>
>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>>> features that were dropped from newer versions.
>>>>>>>>>
>>>>>>>>> A perhaps stupid question to more experienced Java developers: is
>>>>>>>>> it
>>>>>>>>>
>>>>>>>> even
>>>>>>
>>>>>>> possible to mix modules that depend on different versions of the
>>>>>>>>>
>>>>>>>> Lucene
>>>
>>>> libraries within the same project? In my (quite limited)
>>>>>>>>>
>>>>>>>> understanding
>>>
>>>> of
>>>>>>
>>>>>>> Java projects and libraries, this requires special arrangements
>>>>>>>>>
>>>>>>>> (e.g.
>>>
>>>> shading) as the Java package/class namespace is shared by all the
>>>>>>>>>
>>>>>>>> code
>>>
>>>> running within the same JVM.
>>>>>>>>>
>>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>>>>
>>>>>>>> jena-text
>>>>>>
>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>>>>
>>>>>>>> (depending
>>>>>>
>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>>
>>>>>>>>> -Osma
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> My 2 Cents :
>>>>>>>>>>
>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>> ES is
>>>>>>
>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>>>>>>>>>>
>>>>>>>>> if
>>>
>>>> we
>>>>>>
>>>>>>> club them all together. If they stay together and if in the near
>>>>>>>>>>
>>>>>>>>> future I
>>>>>>
>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>>>>
>>>>>>>>> Lucene
>>>>>>
>>>>>>> and Solr and possibly another implementation that may have been
>>>>>>>>>>
>>>>>>>>> added
>>>
>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>>>>
>>>>>>>>> months to
>>>>>>
>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>>>>
>>>>>>>>> Jena-Text as
>>>>>>
>>>>>>> that would be much simpler to do than to upgrade and test and in
>>>>>>>>>>
>>>>>>>>> the
>>>
>>>> process own(read fix bugs) the upgrade for each and every
>>>>>>>>>>
>>>>>>>>> technology.
>>>
>>>>
>>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>>>>
>>>>>>>>> independently
>>>>>>
>>>>>>> of
>>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>>>>>
>>>>>>>>> latest
>>>
>>>> version of Lucene because we do not know what effect it will have
>>>>>>>>>>
>>>>>>>>> on
>>>
>>>> Solr
>>>>>>
>>>>>>> Implementation.
>>>>>>>>>>
>>>>>>>>>> We can start with having a separate Module for Jena Text ES and
>>>>>>>>>> see
>>>>>>>>>>
>>>>>>>>> how
>>>>>>
>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>>>>>>>>>>
>>>>>>>>> out
>>>
>>>> of
>>>>>>
>>>>>>> Jena Text.
>>>>>>>>>>
>>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>>>>
>>>>>>>>> experience.
>>>>>>
>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuj Kumar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>>>>
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>>
>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>>>>>>>>>>>>
>>>>>>>>>>> apache.org
>>>
>>>> %3E
>>>>>>
>>>>>>> ? In other words, might it be better to factor out between -text
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>
>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to
>>>>>>>>>>> volunteer
>>>>>>>>>>>
>>>>>>>>>> to do
>>>>>>
>>>>>>> the actual work!
>>>>>>>>>>>
>>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>>>>
>>>>>>>>>> doing...
>>>>>>
>>>>>>>
>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>>>>
>>>>>>>>>>> work to
>>>>>>
>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>>>>
>>>>>>>>>>> point.
>>>>>>
>>>>>>> :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>>>>>>>>>>>
>>>>>>>>>> how
>>>
>>>> to
>>>>>>
>>>>>>> get
>>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>>>>
>>>>>>>>>> your
>>>>>>
>>>>>>> data
>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>>>>
>>>>>>>>>> anything, so
>>>>>>
>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>>> instructions
>>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Osma
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Osma Suominen
>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>>> National Library of Finland
>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Osma Suominen
>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>> National Library of Finland
>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Osma Suominen
>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>> National Library of Finland
>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>> Tel. +358 50 3199529
>>>>>>> osma.suominen@helsinki.fi
>>>>>>> http://www.nationallibrary.fi
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>
>>>
>>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj,

It's great that we found agreement over this!

I've restarted the Lucene upgrade effort (JENA-1250) that had stalled 
and made a PR [1] that implements the upgrade up to version 6.4.1 (with 
5.5.4 as an intermediate step). I'll wait for comments on the PR and if 
people think it's OK I will merge it soon to Jena master. Meanwhile, you 
can already base your ES implementation on that branch [2] if you like.

Could you please open a JIRA issue on issues.apache.org explaining the 
Elasticsearch support feature, so that we have a place for tracking this 
work, request comments etc.

Also I suggest we move the discussion around this to the developers' 
list (dev@jena.apache.org) where it's more appropriate.

-Osma

[1] https://github.com/apache/jena/pull/219

[2] https://github.com/osma/jena/tree/jena-1250-lucene6

03.03.2017, 02:45, anuj kumar kirjoitti:
> I second that. I am now finalising the integration of ES and should have a
> good production quality implementation ready in a week's time.  At that
> time I would want you guys to have a look at the implementation and provide
> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
> code in jena-text module and do a round of testing.
>
> Thanks,
> Anuj Kumar
>
> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:
>
>> I do agree that trying to juggle different versions of Lucene libraries is
>> probably not a realistic option right now. Luckily (if I understand the
>> conversation thus far correctly) we have a solid alternative; getting our
>> current Lucene dependency upgraded should allow us to (eventually) merge
>> Anuj's work into the mainstream of development. Someone please tell me if I
>> have that wrong! :grin:
>>
>> Let me reiterate that this seems like very good work and speaking for
>> myself, I certainly want to get it included into Jena. It's just a question
>> of fitting it in correctly, which might take a bit of time.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>>>
>>> Hi Anuj!
>>>
>>> I have nothing against modularity in general. However, I cannot see how
>> your proposal could work in practice for the Fuseki build, due to the
>> reasons I mentioned in my previous message (and Adam seemed to concur).
>>>
>>> In any case, I'll see what I can do to get the Lucene upgrade moving
>> again. If all current Jena modules (ie jena-text and jena-spatial) were
>> upgraded to Lucene 6.4.1, then you could just add your ES classes to
>> jena-text, right? I think that would be better for everyone than having to
>> maintain your own separate module.
>>>
>>> -Osma
>>>
>>> 01.03.2017, 16:59, anuj kumar kirjoitti:
>>>> I personally have no preference as to how the code in Jena should be
>>>> structured, as long as I am able to use it :).
>>>> I have personal preference of doing it in a specific way because IMO,
>> it is
>>>> modular which makes it much easier to maintain in the long run. But
>> again
>>>> it may not be the quickest one.
>>>>
>>>> I already have been given a deadline, by the company to have ES
>> extension
>>>> implemented in the next 15 days :). What this means is that I will be
>>>> maintaining the ES code extension to Jena Text at-least locally for a
>>>> coming period of time. I would be more than happy to contribute to Jena
>>>> community whatever is required to have a proper ElasticSearch
>>>> Implementation in place, whether within jena-text module or as a
>> separate
>>>> module. Till the time Lucene and Solr is not upgraded to the latest
>>>> version, I will have to maintain a separate module for jena-text-es.
>>>>
>>>> Cheers!
>>>> Anuj Kumar
>>>>
>>>>
>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>>>
>>>>> Osma--
>>>>>
>>>>> The short answer is that yes, given the right tools you _can_ have
>>>>> different versions of code accessible in different ways. The longer
>> answer
>>>>> is that it's probably not a viable alternative for Jena for this
>> problem,
>>>>> at least not without a lot of other change.
>>>>>
>>>>> You are right to point to the classloader mechanism as being at the
>> heart
>>>>> of this question, but I must alter your remark just slightly. From "the
>>>>> Java classloader only sees a single, flat package/class namespace and
>> a set
>>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>>>> flat package/class namespace and a set of compiled classes".
>>>>>
>>>>> This is the fact that OSGi uses to make it possible to maintain strict
>>>>> module boundaries (and even dynamic module relationships at run-time).
>> Each
>>>>> OSGi bundle sees its own classloader, and the framework is responsible
>> for
>>>>> connecting bundles up to ensure that every bundle has what it needs in
>> the
>>>>> way of types to function, based on metadata that the bundles provide
>> to the
>>>>> framework. It's an incredibly powerful system (I use it every day and
>> enjoy
>>>>> it enormously) but it's also very "heavy" and requires a good deal of
>>>>> investment to use. In particular, it's probably too large to put
>> _inside_
>>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other
>> hand.)
>>>>>
>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>>>> this kind, but it's really meant for the JDK itself, not application
>>>>> libraries. In theory, we could "roll our own" classloader management
>> for
>>>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>>>> There might be another, more lightweight, toolkit out there to this
>>>>> purpose, but I'm not aware of any myself.
>>>>>
>>>>> Otherwise, yes, you get into shading and the like. We have to do that
>> for
>>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
>> hardly a
>>>>> thing we want to do any more of than needed, I don't think.
>>>>>
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>>
>>>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>>>
>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>> wrote:
>>>>>>
>>>>>> Hi Anuj!
>>>>>>
>>>>>> Thanks for the clarification.
>>>>>>
>>>>>> However, I'm still not sure I understand the situation completely. I
>>>>> know Maven can perform a lot of tricks, but Maven modules are just
>>>>> convenient ways to structure a Java project. Maven cannot change the
>> fact
>>>>> that at runtime, module divisions don't really matter (except that they
>>>>> usually correspond to package sub-namespaces) and the Java classloader
>> only
>>>>> sees a single, flat package/class namespace and a set of compiled
>> classes
>>>>> (usually within JARs) in the classpath that it needs to check to find
>> the
>>>>> right classes, and if there are two versions of the same library (eg
>>>>> Lucene) with overlapping class names, that's going to cause trouble.
>> The
>>>>> only way around that is to shade some of the libraries, i.e. rename
>> them so
>>>>> that they end up in another, non-conflicting namespace. Apparently
>>>>> Elasticsearch also did some of that in the past [1] but nowadays tries
>> to
>>>>> avoid it.
>>>>>>
>>>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>> Indexing Technology is used") imply that in the assembler
>> configuration,
>>>>> you cannot have ja:loadClass declarations for both Lucene and ES
>> backends?
>>>>> Or how do you run something like Fuseki that contains (in a single big
>> JAR)
>>>>> both the jena-text and jena-text-es modules with all their
>> dependencies,
>>>>> one of which requires the Lucene 4.x classes and the other one the
>> Lucene
>>>>> 6.4.1 classes? How do you ensure that only one of them is used at a
>> time,
>>>>> and that the Java classloader, even though it has access to both
>> versions
>>>>> of Lucene, only loads classes from the single, correct one and not the
>>>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>>>> packages, so that you don't end up with two Lucene versions within the
>> same
>>>>> Fuseki JAR?
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>>>
>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>>>> Hi Osma,
>>>>>>>
>>>>>>> I understand what you are saying. There are ways to mitigate risks
>> and
>>>>>>> balance the refactoring without affecting the existing modules. But I
>>>>> will
>>>>>>> not delve into those now. I am not an expert in Jena to convincingly
>> say
>>>>>>> that it is possible, without any hiccups. But I can take a guess and
>> say
>>>>>>> that it is indeed possible :)
>>>>>>>
>>>>>>> For the question: "is it even possible to mix modules that depend on
>>>>>>> different versions of the Lucene libraries within the same project?"
>>>>>>>
>>>>>>> I actually do not understand what you mean by mixing modules. I
>> assume
>>>>> you
>>>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>> without
>>>>>>> causing the build to conflict. If that is what you mean than the
>> answer
>>>>> is
>>>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>>>> possible. But before that some assumption which I want to call out
>>>>>>> explicitly.
>>>>>>>
>>>>>>> *Assumption:*
>>>>>>> 1. At a given point in time, only a single Indexing Technology is
>> used
>>>>> for
>>>>>>> text based indexing and searching via Jean. What this means is that
>> we
>>>>> will
>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>>>> Implementation at any given point in time.
>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>> but
>>>>>>> only on jena-text classes, if at all.
>>>>>>>
>>>>>>> Based on these assumptions it is possible to create a build that
>>>>> contains
>>>>>>> jena-text based common classes + ES specific classes without any
>>>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>>>> The key is to include the latest Lucene dependencies at the very
>>>>> beginning
>>>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>>>> automatically resolve the dependency issues by including the Lucene
>>>>>>> librarires that we included in our es specific pom. Have a look the
>> pom
>>>>> of
>>>>>>> jena-text-es module here to see how it can be done :
>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anuj Kumar
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>> osma.suominen@helsinki.fi>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Anuj,
>>>>>>>>
>>>>>>>> I understand your concerns. However, we also need to balance between
>>>>> the
>>>>>>>> needs of individual modules/features and the whole codebase. I'm
>>>>> willing to
>>>>>>>> put in the effort to keep the other modules up to date with newer
>>>>> Lucene
>>>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>> hitches
>>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>>>> features that were dropped from newer versions.
>>>>>>>>
>>>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>>>> even
>>>>>>>> possible to mix modules that depend on different versions of the
>> Lucene
>>>>>>>> libraries within the same project? In my (quite limited)
>> understanding
>>>>> of
>>>>>>>> Java projects and libraries, this requires special arrangements
>> (e.g.
>>>>>>>> shading) as the Java package/class namespace is shared by all the
>> code
>>>>>>>> running within the same JVM.
>>>>>>>>
>>>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>> jena-text
>>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>> (depending
>>>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> My 2 Cents :
>>>>>>>>>
>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>>>> ES is
>>>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
>> if
>>>>> we
>>>>>>>>> club them all together. If they stay together and if in the near
>>>>> future I
>>>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>> Lucene
>>>>>>>>> and Solr and possibly another implementation that may have been
>> added
>>>>>>>>> during the time. As we all know, this means weeks of work if not
>>>>> months to
>>>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>>>> anything and I will probably start maintaining my version of
>>>>> Jena-Text as
>>>>>>>>> that would be much simpler to do than to upgrade and test and in
>> the
>>>>>>>>> process own(read fix bugs) the upgrade for each and every
>> technology.
>>>>>>>>>
>>>>>>>>> If they are developed as separate modules, they can evolve
>>>>> independently
>>>>>>>>> of
>>>>>>>>> each other and we can avoid situations where we cant upgrade to
>> latest
>>>>>>>>> version of Lucene because we do not know what effect it will have
>> on
>>>>> Solr
>>>>>>>>> Implementation.
>>>>>>>>>
>>>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>>>> how
>>>>>>>>> things go. If they go well, we could extract out Solr and Lucene
>> out
>>>>> of
>>>>>>>>> Jena Text.
>>>>>>>>>
>>>>>>>>> Again this is just a suggestion based on my limited industry
>>>>> experience.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anuj Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>> osma.suominen@helsinki.fi
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>>>>
>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
>> apache.org
>>>>> %3E
>>>>>>>>>>> ? In other words, might it be better to factor out between -text
>> and
>>>>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>>>>> to do
>>>>>>>>>> the actual work!
>>>>>>>>>>
>>>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>> doing...
>>>>>>>>>>
>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>> work to
>>>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>>>> point.
>>>>>>>>>>> :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
>> how
>>>>> to
>>>>>>>>>> get
>>>>>>>>>> it running... If you could just try that with some toy data, then
>>>>> your
>>>>>>>>>> data
>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>> anything, so
>>>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>>>> instructions
>>>>>>>>>> are pretty vague unfortunately.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Osma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Osma Suominen
>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>>>> National Library of Finland
>>>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>>>> Tel. +358 50 3199529
>>>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>
>>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time.  At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.

Thanks,
Anuj Kumar

On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote:

> I do agree that trying to juggle different versions of Lucene libraries is
> probably not a realistic option right now. Luckily (if I understand the
> conversation thus far correctly) we have a solid alternative; getting our
> current Lucene dependency upgraded should allow us to (eventually) merge
> Anuj's work into the mainstream of development. Someone please tell me if I
> have that wrong! :grin:
>
> Let me reiterate that this seems like very good work and speaking for
> myself, I certainly want to get it included into Jena. It's just a question
> of fitting it in correctly, which might take a bit of time.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi>
> wrote:
> >
> > Hi Anuj!
> >
> > I have nothing against modularity in general. However, I cannot see how
> your proposal could work in practice for the Fuseki build, due to the
> reasons I mentioned in my previous message (and Adam seemed to concur).
> >
> > In any case, I'll see what I can do to get the Lucene upgrade moving
> again. If all current Jena modules (ie jena-text and jena-spatial) were
> upgraded to Lucene 6.4.1, then you could just add your ES classes to
> jena-text, right? I think that would be better for everyone than having to
> maintain your own separate module.
> >
> > -Osma
> >
> > 01.03.2017, 16:59, anuj kumar kirjoitti:
> >> I personally have no preference as to how the code in Jena should be
> >> structured, as long as I am able to use it :).
> >> I have personal preference of doing it in a specific way because IMO,
> it is
> >> modular which makes it much easier to maintain in the long run. But
> again
> >> it may not be the quickest one.
> >>
> >> I already have been given a deadline, by the company to have ES
> extension
> >> implemented in the next 15 days :). What this means is that I will be
> >> maintaining the ES code extension to Jena Text at-least locally for a
> >> coming period of time. I would be more than happy to contribute to Jena
> >> community whatever is required to have a proper ElasticSearch
> >> Implementation in place, whether within jena-text module or as a
> separate
> >> module. Till the time Lucene and Solr is not upgraded to the latest
> >> version, I will have to maintain a separate module for jena-text-es.
> >>
> >> Cheers!
> >> Anuj Kumar
> >>
> >>
> >> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
> >>
> >>> Osma--
> >>>
> >>> The short answer is that yes, given the right tools you _can_ have
> >>> different versions of code accessible in different ways. The longer
> answer
> >>> is that it's probably not a viable alternative for Jena for this
> problem,
> >>> at least not without a lot of other change.
> >>>
> >>> You are right to point to the classloader mechanism as being at the
> heart
> >>> of this question, but I must alter your remark just slightly. From "the
> >>> Java classloader only sees a single, flat package/class namespace and
> a set
> >>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
> >>> flat package/class namespace and a set of compiled classes".
> >>>
> >>> This is the fact that OSGi uses to make it possible to maintain strict
> >>> module boundaries (and even dynamic module relationships at run-time).
> Each
> >>> OSGi bundle sees its own classloader, and the framework is responsible
> for
> >>> connecting bundles up to ensure that every bundle has what it needs in
> the
> >>> way of types to function, based on metadata that the bundles provide
> to the
> >>> framework. It's an incredibly powerful system (I use it every day and
> enjoy
> >>> it enormously) but it's also very "heavy" and requires a good deal of
> >>> investment to use. In particular, it's probably too large to put
> _inside_
> >>> Jena. (I frequently put Jena inside an OSGi instance, on the other
> hand.)
> >>>
> >>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
> >>> this kind, but it's really meant for the JDK itself, not application
> >>> libraries. In theory, we could "roll our own" classloader management
> for
> >>> this problem. That sounds like more than a bit of a rabbit hole to me.
> >>> There might be another, more lightweight, toolkit out there to this
> >>> purpose, but I'm not aware of any myself.
> >>>
> >>> Otherwise, yes, you get into shading and the like. We have to do that
> for
> >>> Guava for now because of HADOOP-10101 (grumble grumble) but it's
> hardly a
> >>> thing we want to do any more of than needed, I don't think.
> >>>
> >>> ---
> >>> A. Soroka
> >>> The University of Virginia Library
> >>>
> >>> [1] http://openjdk.java.net/projects/jigsaw/
> >>>
> >>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
> >>> wrote:
> >>>>
> >>>> Hi Anuj!
> >>>>
> >>>> Thanks for the clarification.
> >>>>
> >>>> However, I'm still not sure I understand the situation completely. I
> >>> know Maven can perform a lot of tricks, but Maven modules are just
> >>> convenient ways to structure a Java project. Maven cannot change the
> fact
> >>> that at runtime, module divisions don't really matter (except that they
> >>> usually correspond to package sub-namespaces) and the Java classloader
> only
> >>> sees a single, flat package/class namespace and a set of compiled
> classes
> >>> (usually within JARs) in the classpath that it needs to check to find
> the
> >>> right classes, and if there are two versions of the same library (eg
> >>> Lucene) with overlapping class names, that's going to cause trouble.
> The
> >>> only way around that is to shade some of the libraries, i.e. rename
> them so
> >>> that they end up in another, non-conflicting namespace. Apparently
> >>> Elasticsearch also did some of that in the past [1] but nowadays tries
> to
> >>> avoid it.
> >>>>
> >>>> Does your assumption 1 ("At a given point in time, only a single
> >>> Indexing Technology is used") imply that in the assembler
> configuration,
> >>> you cannot have ja:loadClass declarations for both Lucene and ES
> backends?
> >>> Or how do you run something like Fuseki that contains (in a single big
> JAR)
> >>> both the jena-text and jena-text-es modules with all their
> dependencies,
> >>> one of which requires the Lucene 4.x classes and the other one the
> Lucene
> >>> 6.4.1 classes? How do you ensure that only one of them is used at a
> time,
> >>> and that the Java classloader, even though it has access to both
> versions
> >>> of Lucene, only loads classes from the single, correct one and not the
> >>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
> >>> packages, so that you don't end up with two Lucene versions within the
> same
> >>> Fuseki JAR?
> >>>>
> >>>> -Osma
> >>>>
> >>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
> >>>>
> >>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
> >>>>> Hi Osma,
> >>>>>
> >>>>> I understand what you are saying. There are ways to mitigate risks
> and
> >>>>> balance the refactoring without affecting the existing modules. But I
> >>> will
> >>>>> not delve into those now. I am not an expert in Jena to convincingly
> say
> >>>>> that it is possible, without any hiccups. But I can take a guess and
> say
> >>>>> that it is indeed possible :)
> >>>>>
> >>>>> For the question: "is it even possible to mix modules that depend on
> >>>>> different versions of the Lucene libraries within the same project?"
> >>>>>
> >>>>> I actually do not understand what you mean by mixing modules. I
> assume
> >>> you
> >>>>> mean having jena-text and jena-text-es as dependencies in a build
> >>> without
> >>>>> causing the build to conflict. If that is what you mean than the
> answer
> >>> is
> >>>>> yes it is possible and quite simple as well. Let me explain how it is
> >>>>> possible. But before that some assumption which I want to call out
> >>>>> explicitly.
> >>>>>
> >>>>> *Assumption:*
> >>>>> 1. At a given point in time, only a single Indexing Technology is
> used
> >>> for
> >>>>> text based indexing and searching via Jean. What this means is that
> we
> >>> will
> >>>>> either use Lucene Implementation OR Solr Implementation OR ES
> >>>>> Implementation at any given point in time.
> >>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
> but
> >>>>> only on jena-text classes, if at all.
> >>>>>
> >>>>> Based on these assumptions it is possible to create a build that
> >>> contains
> >>>>> jena-text based common classes + ES specific classes without any
> >>>>> compatibility issues. And it is infact quite simple. I did it in the
> >>>>> current jena-text-es module and ran the entire build which succeeded.
> >>>>> The key is to include the latest Lucene dependencies at the very
> >>> beginning
> >>>>> in the pom and then include jena-text dependency. Maven will then
> >>>>> automatically resolve the dependency issues by including the Lucene
> >>>>> librarires that we included in our es specific pom. Have a look the
> pom
> >>> of
> >>>>> jena-text-es module here to see how it can be done :
> >>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Anuj Kumar
> >>>>>
> >>>>>
> >>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
> >>> osma.suominen@helsinki.fi>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Anuj,
> >>>>>>
> >>>>>> I understand your concerns. However, we also need to balance between
> >>> the
> >>>>>> needs of individual modules/features and the whole codebase. I'm
> >>> willing to
> >>>>>> put in the effort to keep the other modules up to date with newer
> >>> Lucene
> >>>>>> versions. Lucene upgrade requirements are well documented, the only
> >>> hitches
> >>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
> >>>>>> features that were dropped from newer versions.
> >>>>>>
> >>>>>> A perhaps stupid question to more experienced Java developers: is it
> >>> even
> >>>>>> possible to mix modules that depend on different versions of the
> Lucene
> >>>>>> libraries within the same project? In my (quite limited)
> understanding
> >>> of
> >>>>>> Java projects and libraries, this requires special arrangements
> (e.g.
> >>>>>> shading) as the Java package/class namespace is shared by all the
> code
> >>>>>> running within the same JVM.
> >>>>>>
> >>>>>> So can you create, say, a Fuseki build that contains the current
> >>> jena-text
> >>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
> >>> (depending
> >>>>>> on Lucene 6.4.1) without any compatibility issues?
> >>>>>>
> >>>>>> -Osma
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> My 2 Cents :
> >>>>>>>
> >>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
> >>> ES is
> >>>>>>> exactly for avoiding the "All or Nothing" approach we need to take
> if
> >>> we
> >>>>>>> club them all together. If they stay together and if in the near
> >>> future I
> >>>>>>> want to upgrade ES to another version, I also need to again upgrade
> >>> Lucene
> >>>>>>> and Solr and possibly another implementation that may have been
> added
> >>>>>>> during the time. As we all know, this means weeks of work if not
> >>> months to
> >>>>>>> get the changes released. This will personally de-motivate me to do
> >>>>>>> anything and I will probably start maintaining my version of
> >>> Jena-Text as
> >>>>>>> that would be much simpler to do than to upgrade and test and in
> the
> >>>>>>> process own(read fix bugs) the upgrade for each and every
> technology.
> >>>>>>>
> >>>>>>> If they are developed as separate modules, they can evolve
> >>> independently
> >>>>>>> of
> >>>>>>> each other and we can avoid situations where we cant upgrade to
> latest
> >>>>>>> version of Lucene because we do not know what effect it will have
> on
> >>> Solr
> >>>>>>> Implementation.
> >>>>>>>
> >>>>>>> We can start with having a separate Module for Jena Text ES and see
> >>> how
> >>>>>>> things go. If they go well, we could extract out Solr and Lucene
> out
> >>> of
> >>>>>>> Jena Text.
> >>>>>>>
> >>>>>>> Again this is just a suggestion based on my limited industry
> >>> experience.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Anuj Kumar
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
> >>> osma.suominen@helsinki.fi
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
> >>>>>>>>
> >>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
> >>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.
> apache.org
> >>> %3E
> >>>>>>>>> ? In other words, might it be better to factor out between -text
> and
> >>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
> >>> to do
> >>>>>>>> the actual work!
> >>>>>>>>
> >>>>>>>> I don't use the Solr component now, but I could easily see so
> >>> doing...
> >>>>>>>>
> >>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
> >>> work to
> >>>>>>>>> maintain it, so consider that just a very small and blurry data
> >>> point.
> >>>>>>>>> :)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out
> how
> >>> to
> >>>>>>>> get
> >>>>>>>> it running... If you could just try that with some toy data, then
> >>> your
> >>>>>>>> data
> >>>>>>>> point would be a lot less blurry :) I haven't used Solr for
> >>> anything, so
> >>>>>>>> I'm not very familiar with how to set it up, and the jena-text
> >>>>>>>> instructions
> >>>>>>>> are pretty vague unfortunately.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Osma
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Osma Suominen
> >>>>>>>> D.Sc. (Tech), Information Systems Specialist
> >>>>>>>> National Library of Finland
> >>>>>>>> P.O. Box 26 (Kaikukatu 4)
> >>>>>>>> 00014 HELSINGIN YLIOPISTO
> >>>>>>>> Tel. +358 50 3199529
> >>>>>>>> osma.suominen@helsinki.fi
> >>>>>>>> http://www.nationallibrary.fi
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Osma Suominen
> >>>>>> D.Sc. (Tech), Information Systems Specialist
> >>>>>> National Library of Finland
> >>>>>> P.O. Box 26 (Kaikukatu 4)
> >>>>>> 00014 HELSINGIN YLIOPISTO
> >>>>>> Tel. +358 50 3199529
> >>>>>> osma.suominen@helsinki.fi
> >>>>>> http://www.nationallibrary.fi
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Osma Suominen
> >>>> D.Sc. (Tech), Information Systems Specialist
> >>>> National Library of Finland
> >>>> P.O. Box 26 (Kaikukatu 4)
> >>>> 00014 HELSINGIN YLIOPISTO
> >>>> Tel. +358 50 3199529
> >>>> osma.suominen@helsinki.fi
> >>>> http://www.nationallibrary.fi
> >>>
> >>>
> >>
> >>
> >
> >
> > --
> > Osma Suominen
> > D.Sc. (Tech), Information Systems Specialist
> > National Library of Finland
> > P.O. Box 26 (Kaikukatu 4)
> > 00014 HELSINGIN YLIOPISTO
> > Tel. +358 50 3199529
> > osma.suominen@helsinki.fi
> > http://www.nationallibrary.fi
>
>

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by "A. Soroka" <aj...@virginia.edu>.
I do agree that trying to juggle different versions of Lucene libraries is probably not a realistic option right now. Luckily (if I understand the conversation thus far correctly) we have a solid alternative; getting our current Lucene dependency upgraded should allow us to (eventually) merge Anuj's work into the mainstream of development. Someone please tell me if I have that wrong! :grin:

Let me reiterate that this seems like very good work and speaking for myself, I certainly want to get it included into Jena. It's just a question of fitting it in correctly, which might take a bit of time. 

---
A. Soroka
The University of Virginia Library

> On Mar 1, 2017, at 1:27 PM, Osma Suominen <os...@helsinki.fi> wrote:
> 
> Hi Anuj!
> 
> I have nothing against modularity in general. However, I cannot see how your proposal could work in practice for the Fuseki build, due to the reasons I mentioned in my previous message (and Adam seemed to concur).
> 
> In any case, I'll see what I can do to get the Lucene upgrade moving again. If all current Jena modules (ie jena-text and jena-spatial) were upgraded to Lucene 6.4.1, then you could just add your ES classes to jena-text, right? I think that would be better for everyone than having to maintain your own separate module.
> 
> -Osma
> 
> 01.03.2017, 16:59, anuj kumar kirjoitti:
>> I personally have no preference as to how the code in Jena should be
>> structured, as long as I am able to use it :).
>> I have personal preference of doing it in a specific way because IMO, it is
>> modular which makes it much easier to maintain in the long run. But again
>> it may not be the quickest one.
>> 
>> I already have been given a deadline, by the company to have ES extension
>> implemented in the next 15 days :). What this means is that I will be
>> maintaining the ES code extension to Jena Text at-least locally for a
>> coming period of time. I would be more than happy to contribute to Jena
>> community whatever is required to have a proper ElasticSearch
>> Implementation in place, whether within jena-text module or as a separate
>> module. Till the time Lucene and Solr is not upgraded to the latest
>> version, I will have to maintain a separate module for jena-text-es.
>> 
>> Cheers!
>> Anuj Kumar
>> 
>> 
>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>> 
>>> Osma--
>>> 
>>> The short answer is that yes, given the right tools you _can_ have
>>> different versions of code accessible in different ways. The longer answer
>>> is that it's probably not a viable alternative for Jena for this problem,
>>> at least not without a lot of other change.
>>> 
>>> You are right to point to the classloader mechanism as being at the heart
>>> of this question, but I must alter your remark just slightly. From "the
>>> Java classloader only sees a single, flat package/class namespace and a set
>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>> flat package/class namespace and a set of compiled classes".
>>> 
>>> This is the fact that OSGi uses to make it possible to maintain strict
>>> module boundaries (and even dynamic module relationships at run-time). Each
>>> OSGi bundle sees its own classloader, and the framework is responsible for
>>> connecting bundles up to ensure that every bundle has what it needs in the
>>> way of types to function, based on metadata that the bundles provide to the
>>> framework. It's an incredibly powerful system (I use it every day and enjoy
>>> it enormously) but it's also very "heavy" and requires a good deal of
>>> investment to use. In particular, it's probably too large to put _inside_
>>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>> 
>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>> this kind, but it's really meant for the JDK itself, not application
>>> libraries. In theory, we could "roll our own" classloader management for
>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>> There might be another, more lightweight, toolkit out there to this
>>> purpose, but I'm not aware of any myself.
>>> 
>>> Otherwise, yes, you get into shading and the like. We have to do that for
>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>>> thing we want to do any more of than needed, I don't think.
>>> 
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>> 
>>> [1] http://openjdk.java.net/projects/jigsaw/
>>> 
>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>> wrote:
>>>> 
>>>> Hi Anuj!
>>>> 
>>>> Thanks for the clarification.
>>>> 
>>>> However, I'm still not sure I understand the situation completely. I
>>> know Maven can perform a lot of tricks, but Maven modules are just
>>> convenient ways to structure a Java project. Maven cannot change the fact
>>> that at runtime, module divisions don't really matter (except that they
>>> usually correspond to package sub-namespaces) and the Java classloader only
>>> sees a single, flat package/class namespace and a set of compiled classes
>>> (usually within JARs) in the classpath that it needs to check to find the
>>> right classes, and if there are two versions of the same library (eg
>>> Lucene) with overlapping class names, that's going to cause trouble. The
>>> only way around that is to shade some of the libraries, i.e. rename them so
>>> that they end up in another, non-conflicting namespace. Apparently
>>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>>> avoid it.
>>>> 
>>>> Does your assumption 1 ("At a given point in time, only a single
>>> Indexing Technology is used") imply that in the assembler configuration,
>>> you cannot have ja:loadClass declarations for both Lucene and ES backends?
>>> Or how do you run something like Fuseki that contains (in a single big JAR)
>>> both the jena-text and jena-text-es modules with all their dependencies,
>>> one of which requires the Lucene 4.x classes and the other one the Lucene
>>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>>> and that the Java classloader, even though it has access to both versions
>>> of Lucene, only loads classes from the single, correct one and not the
>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>> packages, so that you don't end up with two Lucene versions within the same
>>> Fuseki JAR?
>>>> 
>>>> -Osma
>>>> 
>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>> 
>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>> Hi Osma,
>>>>> 
>>>>> I understand what you are saying. There are ways to mitigate risks and
>>>>> balance the refactoring without affecting the existing modules. But I
>>> will
>>>>> not delve into those now. I am not an expert in Jena to convincingly say
>>>>> that it is possible, without any hiccups. But I can take a guess and say
>>>>> that it is indeed possible :)
>>>>> 
>>>>> For the question: "is it even possible to mix modules that depend on
>>>>> different versions of the Lucene libraries within the same project?"
>>>>> 
>>>>> I actually do not understand what you mean by mixing modules. I assume
>>> you
>>>>> mean having jena-text and jena-text-es as dependencies in a build
>>> without
>>>>> causing the build to conflict. If that is what you mean than the answer
>>> is
>>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>> possible. But before that some assumption which I want to call out
>>>>> explicitly.
>>>>> 
>>>>> *Assumption:*
>>>>> 1. At a given point in time, only a single Indexing Technology is used
>>> for
>>>>> text based indexing and searching via Jean. What this means is that we
>>> will
>>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>> Implementation at any given point in time.
>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
>>>>> only on jena-text classes, if at all.
>>>>> 
>>>>> Based on these assumptions it is possible to create a build that
>>> contains
>>>>> jena-text based common classes + ES specific classes without any
>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>> The key is to include the latest Lucene dependencies at the very
>>> beginning
>>>>> in the pom and then include jena-text dependency. Maven will then
>>>>> automatically resolve the dependency issues by including the Lucene
>>>>> librarires that we included in our es specific pom. Have a look the pom
>>> of
>>>>> jena-text-es module here to see how it can be done :
>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Anuj Kumar
>>>>> 
>>>>> 
>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>> osma.suominen@helsinki.fi>
>>>>> wrote:
>>>>> 
>>>>>> Hi Anuj,
>>>>>> 
>>>>>> I understand your concerns. However, we also need to balance between
>>> the
>>>>>> needs of individual modules/features and the whole codebase. I'm
>>> willing to
>>>>>> put in the effort to keep the other modules up to date with newer
>>> Lucene
>>>>>> versions. Lucene upgrade requirements are well documented, the only
>>> hitches
>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>> features that were dropped from newer versions.
>>>>>> 
>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>> even
>>>>>> possible to mix modules that depend on different versions of the Lucene
>>>>>> libraries within the same project? In my (quite limited) understanding
>>> of
>>>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>>> shading) as the Java package/class namespace is shared by all the code
>>>>>> running within the same JVM.
>>>>>> 
>>>>>> So can you create, say, a Fuseki build that contains the current
>>> jena-text
>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>> (depending
>>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>> 
>>>>>> -Osma
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> My 2 Cents :
>>>>>>> 
>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>> ES is
>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take if
>>> we
>>>>>>> club them all together. If they stay together and if in the near
>>> future I
>>>>>>> want to upgrade ES to another version, I also need to again upgrade
>>> Lucene
>>>>>>> and Solr and possibly another implementation that may have been added
>>>>>>> during the time. As we all know, this means weeks of work if not
>>> months to
>>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>>> anything and I will probably start maintaining my version of
>>> Jena-Text as
>>>>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>>> process own(read fix bugs) the upgrade for each and every technology.
>>>>>>> 
>>>>>>> If they are developed as separate modules, they can evolve
>>> independently
>>>>>>> of
>>>>>>> each other and we can avoid situations where we cant upgrade to latest
>>>>>>> version of Lucene because we do not know what effect it will have on
>>> Solr
>>>>>>> Implementation.
>>>>>>> 
>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>> how
>>>>>>> things go. If they go well, we could extract out Solr and Lucene out
>>> of
>>>>>>> Jena Text.
>>>>>>> 
>>>>>>> Again this is just a suggestion based on my limited industry
>>> experience.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Anuj Kumar
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>> osma.suominen@helsinki.fi
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>> 
>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org
>>> %3E
>>>>>>>>> ? In other words, might it be better to factor out between -text and
>>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>>> to do
>>>>>>>> the actual work!
>>>>>>>> 
>>>>>>>> I don't use the Solr component now, but I could easily see so
>>> doing...
>>>>>>>> 
>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>> work to
>>>>>>>>> maintain it, so consider that just a very small and blurry data
>>> point.
>>>>>>>>> :)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>>> to
>>>>>>>> get
>>>>>>>> it running... If you could just try that with some toy data, then
>>> your
>>>>>>>> data
>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>> anything, so
>>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>> instructions
>>>>>>>> are pretty vague unfortunately.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Osma
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi


Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
I agree Osma. If Lucent is upgraded to 6.4.1 it would be much easier for me
to integrate the Elastic Search implementation.

But I am still waiting for someone to provide me a hint as to how I can
index multiple predicate values. This is the most pressing issue for me
currently.

Thanks,
Anuj Kumar

On 1 Mar 2017 19:27, "Osma Suominen" <os...@helsinki.fi> wrote:

> Hi Anuj!
>
> I have nothing against modularity in general. However, I cannot see how
> your proposal could work in practice for the Fuseki build, due to the
> reasons I mentioned in my previous message (and Adam seemed to concur).
>
> In any case, I'll see what I can do to get the Lucene upgrade moving
> again. If all current Jena modules (ie jena-text and jena-spatial) were
> upgraded to Lucene 6.4.1, then you could just add your ES classes to
> jena-text, right? I think that would be better for everyone than having to
> maintain your own separate module.
>
> -Osma
>
> 01.03.2017, 16:59, anuj kumar kirjoitti:
>
>> I personally have no preference as to how the code in Jena should be
>> structured, as long as I am able to use it :).
>> I have personal preference of doing it in a specific way because IMO, it
>> is
>> modular which makes it much easier to maintain in the long run. But again
>> it may not be the quickest one.
>>
>> I already have been given a deadline, by the company to have ES extension
>> implemented in the next 15 days :). What this means is that I will be
>> maintaining the ES code extension to Jena Text at-least locally for a
>> coming period of time. I would be more than happy to contribute to Jena
>> community whatever is required to have a proper ElasticSearch
>> Implementation in place, whether within jena-text module or as a separate
>> module. Till the time Lucene and Solr is not upgraded to the latest
>> version, I will have to maintain a separate module for jena-text-es.
>>
>> Cheers!
>> Anuj Kumar
>>
>>
>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>>
>> Osma--
>>>
>>> The short answer is that yes, given the right tools you _can_ have
>>> different versions of code accessible in different ways. The longer
>>> answer
>>> is that it's probably not a viable alternative for Jena for this problem,
>>> at least not without a lot of other change.
>>>
>>> You are right to point to the classloader mechanism as being at the heart
>>> of this question, but I must alter your remark just slightly. From "the
>>> Java classloader only sees a single, flat package/class namespace and a
>>> set
>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>>> flat package/class namespace and a set of compiled classes".
>>>
>>> This is the fact that OSGi uses to make it possible to maintain strict
>>> module boundaries (and even dynamic module relationships at run-time).
>>> Each
>>> OSGi bundle sees its own classloader, and the framework is responsible
>>> for
>>> connecting bundles up to ensure that every bundle has what it needs in
>>> the
>>> way of types to function, based on metadata that the bundles provide to
>>> the
>>> framework. It's an incredibly powerful system (I use it every day and
>>> enjoy
>>> it enormously) but it's also very "heavy" and requires a good deal of
>>> investment to use. In particular, it's probably too large to put _inside_
>>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>>
>>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>>> this kind, but it's really meant for the JDK itself, not application
>>> libraries. In theory, we could "roll our own" classloader management for
>>> this problem. That sounds like more than a bit of a rabbit hole to me.
>>> There might be another, more lightweight, toolkit out there to this
>>> purpose, but I'm not aware of any myself.
>>>
>>> Otherwise, yes, you get into shading and the like. We have to do that for
>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>>> thing we want to do any more of than needed, I don't think.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> [1] http://openjdk.java.net/projects/jigsaw/
>>>
>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>>>>
>>> wrote:
>>>
>>>>
>>>> Hi Anuj!
>>>>
>>>> Thanks for the clarification.
>>>>
>>>> However, I'm still not sure I understand the situation completely. I
>>>>
>>> know Maven can perform a lot of tricks, but Maven modules are just
>>> convenient ways to structure a Java project. Maven cannot change the fact
>>> that at runtime, module divisions don't really matter (except that they
>>> usually correspond to package sub-namespaces) and the Java classloader
>>> only
>>> sees a single, flat package/class namespace and a set of compiled classes
>>> (usually within JARs) in the classpath that it needs to check to find the
>>> right classes, and if there are two versions of the same library (eg
>>> Lucene) with overlapping class names, that's going to cause trouble. The
>>> only way around that is to shade some of the libraries, i.e. rename them
>>> so
>>> that they end up in another, non-conflicting namespace. Apparently
>>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>>> avoid it.
>>>
>>>>
>>>> Does your assumption 1 ("At a given point in time, only a single
>>>>
>>> Indexing Technology is used") imply that in the assembler configuration,
>>> you cannot have ja:loadClass declarations for both Lucene and ES
>>> backends?
>>> Or how do you run something like Fuseki that contains (in a single big
>>> JAR)
>>> both the jena-text and jena-text-es modules with all their dependencies,
>>> one of which requires the Lucene 4.x classes and the other one the Lucene
>>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>>> and that the Java classloader, even though it has access to both versions
>>> of Lucene, only loads classes from the single, correct one and not the
>>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>>> packages, so that you don't end up with two Lucene versions within the
>>> same
>>> Fuseki JAR?
>>>
>>>>
>>>> -Osma
>>>>
>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>>
>>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>>
>>>>> Hi Osma,
>>>>>
>>>>> I understand what you are saying. There are ways to mitigate risks and
>>>>> balance the refactoring without affecting the existing modules. But I
>>>>>
>>>> will
>>>
>>>> not delve into those now. I am not an expert in Jena to convincingly say
>>>>> that it is possible, without any hiccups. But I can take a guess and
>>>>> say
>>>>> that it is indeed possible :)
>>>>>
>>>>> For the question: "is it even possible to mix modules that depend on
>>>>> different versions of the Lucene libraries within the same project?"
>>>>>
>>>>> I actually do not understand what you mean by mixing modules. I assume
>>>>>
>>>> you
>>>
>>>> mean having jena-text and jena-text-es as dependencies in a build
>>>>>
>>>> without
>>>
>>>> causing the build to conflict. If that is what you mean than the answer
>>>>>
>>>> is
>>>
>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>>> possible. But before that some assumption which I want to call out
>>>>> explicitly.
>>>>>
>>>>> *Assumption:*
>>>>> 1. At a given point in time, only a single Indexing Technology is used
>>>>>
>>>> for
>>>
>>>> text based indexing and searching via Jean. What this means is that we
>>>>>
>>>> will
>>>
>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>>> Implementation at any given point in time.
>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>>>>> but
>>>>> only on jena-text classes, if at all.
>>>>>
>>>>> Based on these assumptions it is possible to create a build that
>>>>>
>>>> contains
>>>
>>>> jena-text based common classes + ES specific classes without any
>>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>>> current jena-text-es module and ran the entire build which succeeded.
>>>>> The key is to include the latest Lucene dependencies at the very
>>>>>
>>>> beginning
>>>
>>>> in the pom and then include jena-text dependency. Maven will then
>>>>> automatically resolve the dependency issues by including the Lucene
>>>>> librarires that we included in our es specific pom. Have a look the pom
>>>>>
>>>> of
>>>
>>>> jena-text-es module here to see how it can be done :
>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Anuj Kumar
>>>>>
>>>>>
>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>>>>>
>>>> osma.suominen@helsinki.fi>
>>>
>>>> wrote:
>>>>>
>>>>> Hi Anuj,
>>>>>>
>>>>>> I understand your concerns. However, we also need to balance between
>>>>>>
>>>>> the
>>>
>>>> needs of individual modules/features and the whole codebase. I'm
>>>>>>
>>>>> willing to
>>>
>>>> put in the effort to keep the other modules up to date with newer
>>>>>>
>>>>> Lucene
>>>
>>>> versions. Lucene upgrade requirements are well documented, the only
>>>>>>
>>>>> hitches
>>>
>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>>> features that were dropped from newer versions.
>>>>>>
>>>>>> A perhaps stupid question to more experienced Java developers: is it
>>>>>>
>>>>> even
>>>
>>>> possible to mix modules that depend on different versions of the Lucene
>>>>>> libraries within the same project? In my (quite limited) understanding
>>>>>>
>>>>> of
>>>
>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>>> shading) as the Java package/class namespace is shared by all the code
>>>>>> running within the same JVM.
>>>>>>
>>>>>> So can you create, say, a Fuseki build that contains the current
>>>>>>
>>>>> jena-text
>>>
>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>>>>>>
>>>>> (depending
>>>
>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> My 2 Cents :
>>>>>>>
>>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>>>>>>>
>>>>>> ES is
>>>
>>>> exactly for avoiding the "All or Nothing" approach we need to take if
>>>>>>>
>>>>>> we
>>>
>>>> club them all together. If they stay together and if in the near
>>>>>>>
>>>>>> future I
>>>
>>>> want to upgrade ES to another version, I also need to again upgrade
>>>>>>>
>>>>>> Lucene
>>>
>>>> and Solr and possibly another implementation that may have been added
>>>>>>> during the time. As we all know, this means weeks of work if not
>>>>>>>
>>>>>> months to
>>>
>>>> get the changes released. This will personally de-motivate me to do
>>>>>>> anything and I will probably start maintaining my version of
>>>>>>>
>>>>>> Jena-Text as
>>>
>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>>> process own(read fix bugs) the upgrade for each and every technology.
>>>>>>>
>>>>>>> If they are developed as separate modules, they can evolve
>>>>>>>
>>>>>> independently
>>>
>>>> of
>>>>>>> each other and we can avoid situations where we cant upgrade to
>>>>>>> latest
>>>>>>> version of Lucene because we do not know what effect it will have on
>>>>>>>
>>>>>> Solr
>>>
>>>> Implementation.
>>>>>>>
>>>>>>> We can start with having a separate Module for Jena Text ES and see
>>>>>>>
>>>>>> how
>>>
>>>> things go. If they go well, we could extract out Solr and Lucene out
>>>>>>>
>>>>>> of
>>>
>>>> Jena Text.
>>>>>>>
>>>>>>> Again this is just a suggestion based on my limited industry
>>>>>>>
>>>>>> experience.
>>>
>>>>
>>>>>>> Thanks,
>>>>>>> Anuj Kumar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>>>>>>>
>>>>>> osma.suominen@helsinki.fi
>>>
>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>
>>>>>>>>
>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>>
>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>>>>>>>>> che.org
>>>>>>>>>
>>>>>>>> %3E
>>>
>>>> ? In other words, might it be better to factor out between -text and
>>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>>>>>>>>
>>>>>>> to do
>>>
>>>> the actual work!
>>>>>>>>
>>>>>>>> I don't use the Solr component now, but I could easily see so
>>>>>>>>
>>>>>>> doing...
>>>
>>>>
>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>>>>>>>>>
>>>>>>>> work to
>>>
>>>> maintain it, so consider that just a very small and blurry data
>>>>>>>>>
>>>>>>>> point.
>>>
>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>>>>>>>>
>>>>>>> to
>>>
>>>> get
>>>>>>>> it running... If you could just try that with some toy data, then
>>>>>>>>
>>>>>>> your
>>>
>>>> data
>>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>>>>>>>>
>>>>>>> anything, so
>>>
>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>>> instructions
>>>>>>>> are pretty vague unfortunately.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Osma
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Osma Suominen
>>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>>> National Library of Finland
>>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>>> Tel. +358 50 3199529
>>>>>>>> osma.suominen@helsinki.fi
>>>>>>>> http://www.nationallibrary.fi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Osma Suominen
>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>> National Library of Finland
>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>> Tel. +358 50 3199529
>>>>>> osma.suominen@helsinki.fi
>>>>>> http://www.nationallibrary.fi
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>
>>>
>>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj!

I have nothing against modularity in general. However, I cannot see how 
your proposal could work in practice for the Fuseki build, due to the 
reasons I mentioned in my previous message (and Adam seemed to concur).

In any case, I'll see what I can do to get the Lucene upgrade moving 
again. If all current Jena modules (ie jena-text and jena-spatial) were 
upgraded to Lucene 6.4.1, then you could just add your ES classes to 
jena-text, right? I think that would be better for everyone than having 
to maintain your own separate module.

-Osma

01.03.2017, 16:59, anuj kumar kirjoitti:
> I personally have no preference as to how the code in Jena should be
> structured, as long as I am able to use it :).
> I have personal preference of doing it in a specific way because IMO, it is
> modular which makes it much easier to maintain in the long run. But again
> it may not be the quickest one.
>
> I already have been given a deadline, by the company to have ES extension
> implemented in the next 15 days :). What this means is that I will be
> maintaining the ES code extension to Jena Text at-least locally for a
> coming period of time. I would be more than happy to contribute to Jena
> community whatever is required to have a proper ElasticSearch
> Implementation in place, whether within jena-text module or as a separate
> module. Till the time Lucene and Solr is not upgraded to the latest
> version, I will have to maintain a separate module for jena-text-es.
>
> Cheers!
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>
>> Osma--
>>
>> The short answer is that yes, given the right tools you _can_ have
>> different versions of code accessible in different ways. The longer answer
>> is that it's probably not a viable alternative for Jena for this problem,
>> at least not without a lot of other change.
>>
>> You are right to point to the classloader mechanism as being at the heart
>> of this question, but I must alter your remark just slightly. From "the
>> Java classloader only sees a single, flat package/class namespace and a set
>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>> flat package/class namespace and a set of compiled classes".
>>
>> This is the fact that OSGi uses to make it possible to maintain strict
>> module boundaries (and even dynamic module relationships at run-time). Each
>> OSGi bundle sees its own classloader, and the framework is responsible for
>> connecting bundles up to ensure that every bundle has what it needs in the
>> way of types to function, based on metadata that the bundles provide to the
>> framework. It's an incredibly powerful system (I use it every day and enjoy
>> it enormously) but it's also very "heavy" and requires a good deal of
>> investment to use. In particular, it's probably too large to put _inside_
>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>
>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>> this kind, but it's really meant for the JDK itself, not application
>> libraries. In theory, we could "roll our own" classloader management for
>> this problem. That sounds like more than a bit of a rabbit hole to me.
>> There might be another, more lightweight, toolkit out there to this
>> purpose, but I'm not aware of any myself.
>>
>> Otherwise, yes, you get into shading and the like. We have to do that for
>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>> thing we want to do any more of than needed, I don't think.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>> [1] http://openjdk.java.net/projects/jigsaw/
>>
>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>>>
>>> Hi Anuj!
>>>
>>> Thanks for the clarification.
>>>
>>> However, I'm still not sure I understand the situation completely. I
>> know Maven can perform a lot of tricks, but Maven modules are just
>> convenient ways to structure a Java project. Maven cannot change the fact
>> that at runtime, module divisions don't really matter (except that they
>> usually correspond to package sub-namespaces) and the Java classloader only
>> sees a single, flat package/class namespace and a set of compiled classes
>> (usually within JARs) in the classpath that it needs to check to find the
>> right classes, and if there are two versions of the same library (eg
>> Lucene) with overlapping class names, that's going to cause trouble. The
>> only way around that is to shade some of the libraries, i.e. rename them so
>> that they end up in another, non-conflicting namespace. Apparently
>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>> avoid it.
>>>
>>> Does your assumption 1 ("At a given point in time, only a single
>> Indexing Technology is used") imply that in the assembler configuration,
>> you cannot have ja:loadClass declarations for both Lucene and ES backends?
>> Or how do you run something like Fuseki that contains (in a single big JAR)
>> both the jena-text and jena-text-es modules with all their dependencies,
>> one of which requires the Lucene 4.x classes and the other one the Lucene
>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>> and that the Java classloader, even though it has access to both versions
>> of Lucene, only loads classes from the single, correct one and not the
>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>> packages, so that you don't end up with two Lucene versions within the same
>> Fuseki JAR?
>>>
>>> -Osma
>>>
>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>>>
>>> 01.03.2017, 11:03, anuj kumar kirjoitti:
>>>> Hi Osma,
>>>>
>>>> I understand what you are saying. There are ways to mitigate risks and
>>>> balance the refactoring without affecting the existing modules. But I
>> will
>>>> not delve into those now. I am not an expert in Jena to convincingly say
>>>> that it is possible, without any hiccups. But I can take a guess and say
>>>> that it is indeed possible :)
>>>>
>>>> For the question: "is it even possible to mix modules that depend on
>>>> different versions of the Lucene libraries within the same project?"
>>>>
>>>> I actually do not understand what you mean by mixing modules. I assume
>> you
>>>> mean having jena-text and jena-text-es as dependencies in a build
>> without
>>>> causing the build to conflict. If that is what you mean than the answer
>> is
>>>> yes it is possible and quite simple as well. Let me explain how it is
>>>> possible. But before that some assumption which I want to call out
>>>> explicitly.
>>>>
>>>> *Assumption:*
>>>> 1. At a given point in time, only a single Indexing Technology is used
>> for
>>>> text based indexing and searching via Jean. What this means is that we
>> will
>>>> either use Lucene Implementation OR Solr Implementation OR ES
>>>> Implementation at any given point in time.
>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
>>>> only on jena-text classes, if at all.
>>>>
>>>> Based on these assumptions it is possible to create a build that
>> contains
>>>> jena-text based common classes + ES specific classes without any
>>>> compatibility issues. And it is infact quite simple. I did it in the
>>>> current jena-text-es module and ran the entire build which succeeded.
>>>> The key is to include the latest Lucene dependencies at the very
>> beginning
>>>> in the pom and then include jena-text dependency. Maven will then
>>>> automatically resolve the dependency issues by including the Lucene
>>>> librarires that we included in our es specific pom. Have a look the pom
>> of
>>>> jena-text-es module here to see how it can be done :
>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>>>>
>>>>
>>>> Thanks,
>>>> Anuj Kumar
>>>>
>>>>
>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>> osma.suominen@helsinki.fi>
>>>> wrote:
>>>>
>>>>> Hi Anuj,
>>>>>
>>>>> I understand your concerns. However, we also need to balance between
>> the
>>>>> needs of individual modules/features and the whole codebase. I'm
>> willing to
>>>>> put in the effort to keep the other modules up to date with newer
>> Lucene
>>>>> versions. Lucene upgrade requirements are well documented, the only
>> hitches
>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>>>> features that were dropped from newer versions.
>>>>>
>>>>> A perhaps stupid question to more experienced Java developers: is it
>> even
>>>>> possible to mix modules that depend on different versions of the Lucene
>>>>> libraries within the same project? In my (quite limited) understanding
>> of
>>>>> Java projects and libraries, this requires special arrangements (e.g.
>>>>> shading) as the Java package/class namespace is shared by all the code
>>>>> running within the same JVM.
>>>>>
>>>>> So can you create, say, a Fuseki build that contains the current
>> jena-text
>>>>> module (depending on Lucene 4.x) and the new jena-text-es module
>> (depending
>>>>> on Lucene 6.4.1) without any compatibility issues?
>>>>>
>>>>> -Osma
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> My 2 Cents :
>>>>>>
>>>>>> The reason I proposed to have separate modules for Lucene, Solr and
>> ES is
>>>>>> exactly for avoiding the "All or Nothing" approach we need to take if
>> we
>>>>>> club them all together. If they stay together and if in the near
>> future I
>>>>>> want to upgrade ES to another version, I also need to again upgrade
>> Lucene
>>>>>> and Solr and possibly another implementation that may have been added
>>>>>> during the time. As we all know, this means weeks of work if not
>> months to
>>>>>> get the changes released. This will personally de-motivate me to do
>>>>>> anything and I will probably start maintaining my version of
>> Jena-Text as
>>>>>> that would be much simpler to do than to upgrade and test and in the
>>>>>> process own(read fix bugs) the upgrade for each and every technology.
>>>>>>
>>>>>> If they are developed as separate modules, they can evolve
>> independently
>>>>>> of
>>>>>> each other and we can avoid situations where we cant upgrade to latest
>>>>>> version of Lucene because we do not know what effect it will have on
>> Solr
>>>>>> Implementation.
>>>>>>
>>>>>> We can start with having a separate Module for Jena Text ES and see
>> how
>>>>>> things go. If they go well, we could extract out Solr and Lucene out
>> of
>>>>>> Jena Text.
>>>>>>
>>>>>> Again this is just a suggestion based on my limited industry
>> experience.
>>>>>>
>>>>>> Thanks,
>>>>>> Anuj Kumar
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>> osma.suominen@helsinki.fi
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>>>>
>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org
>> %3E
>>>>>>>> ? In other words, might it be better to factor out between -text and
>>>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>>>>
>>>>>>>>
>>>>>>> I certainly wouldn't object to that, but somebody has to volunteer
>> to do
>>>>>>> the actual work!
>>>>>>>
>>>>>>> I don't use the Solr component now, but I could easily see so
>> doing...
>>>>>>>
>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any
>> work to
>>>>>>>> maintain it, so consider that just a very small and blurry data
>> point.
>>>>>>>> :)
>>>>>>>>
>>>>>>>>
>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>> to
>>>>>>> get
>>>>>>> it running... If you could just try that with some toy data, then
>> your
>>>>>>> data
>>>>>>> point would be a lot less blurry :) I haven't used Solr for
>> anything, so
>>>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>>>> instructions
>>>>>>> are pretty vague unfortunately.
>>>>>>>
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Osma Suominen
>>>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>>>> National Library of Finland
>>>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>>>> 00014 HELSINGIN YLIOPISTO
>>>>>>> Tel. +358 50 3199529
>>>>>>> osma.suominen@helsinki.fi
>>>>>>> http://www.nationallibrary.fi
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>
>>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
BTW, I have one more question:

How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?

I tried :

EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());

But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.

Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 3:59 PM, anuj kumar <an...@gmail.com> wrote:

> I personally have no preference as to how the code in Jena should be
> structured, as long as I am able to use it :).
> I have personal preference of doing it in a specific way because IMO, it
> is modular which makes it much easier to maintain in the long run. But
> again it may not be the quickest one.
>
> I already have been given a deadline, by the company to have ES extension
> implemented in the next 15 days :). What this means is that I will be
> maintaining the ES code extension to Jena Text at-least locally for a
> coming period of time. I would be more than happy to contribute to Jena
> community whatever is required to have a proper ElasticSearch
> Implementation in place, whether within jena-text module or as a separate
> module. Till the time Lucene and Solr is not upgraded to the latest
> version, I will have to maintain a separate module for jena-text-es.
>
> Cheers!
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:
>
>> Osma--
>>
>> The short answer is that yes, given the right tools you _can_ have
>> different versions of code accessible in different ways. The longer answer
>> is that it's probably not a viable alternative for Jena for this problem,
>> at least not without a lot of other change.
>>
>> You are right to point to the classloader mechanism as being at the heart
>> of this question, but I must alter your remark just slightly. From "the
>> Java classloader only sees a single, flat package/class namespace and a set
>> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
>> flat package/class namespace and a set of compiled classes".
>>
>> This is the fact that OSGi uses to make it possible to maintain strict
>> module boundaries (and even dynamic module relationships at run-time). Each
>> OSGi bundle sees its own classloader, and the framework is responsible for
>> connecting bundles up to ensure that every bundle has what it needs in the
>> way of types to function, based on metadata that the bundles provide to the
>> framework. It's an incredibly powerful system (I use it every day and enjoy
>> it enormously) but it's also very "heavy" and requires a good deal of
>> investment to use. In particular, it's probably too large to put _inside_
>> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>>
>> Java 9 Jigsaw [1] offers some possibility for strong modularization of
>> this kind, but it's really meant for the JDK itself, not application
>> libraries. In theory, we could "roll our own" classloader management for
>> this problem. That sounds like more than a bit of a rabbit hole to me.
>> There might be another, more lightweight, toolkit out there to this
>> purpose, but I'm not aware of any myself.
>>
>> Otherwise, yes, you get into shading and the like. We have to do that for
>> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
>> thing we want to do any more of than needed, I don't think.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>> [1] http://openjdk.java.net/projects/jigsaw/
>>
>> > On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>> >
>> > Hi Anuj!
>> >
>> > Thanks for the clarification.
>> >
>> > However, I'm still not sure I understand the situation completely. I
>> know Maven can perform a lot of tricks, but Maven modules are just
>> convenient ways to structure a Java project. Maven cannot change the fact
>> that at runtime, module divisions don't really matter (except that they
>> usually correspond to package sub-namespaces) and the Java classloader only
>> sees a single, flat package/class namespace and a set of compiled classes
>> (usually within JARs) in the classpath that it needs to check to find the
>> right classes, and if there are two versions of the same library (eg
>> Lucene) with overlapping class names, that's going to cause trouble. The
>> only way around that is to shade some of the libraries, i.e. rename them so
>> that they end up in another, non-conflicting namespace. Apparently
>> Elasticsearch also did some of that in the past [1] but nowadays tries to
>> avoid it.
>> >
>> > Does your assumption 1 ("At a given point in time, only a single
>> Indexing Technology is used") imply that in the assembler configuration,
>> you cannot have ja:loadClass declarations for both Lucene and ES backends?
>> Or how do you run something like Fuseki that contains (in a single big JAR)
>> both the jena-text and jena-text-es modules with all their dependencies,
>> one of which requires the Lucene 4.x classes and the other one the Lucene
>> 6.4.1 classes? How do you ensure that only one of them is used at a time,
>> and that the Java classloader, even though it has access to both versions
>> of Lucene, only loads classes from the single, correct one and not the
>> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
>> packages, so that you don't end up with two Lucene versions within the same
>> Fuseki JAR?
>> >
>> > -Osma
>> >
>> > [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
>> >
>> > 01.03.2017, 11:03, anuj kumar kirjoitti:
>> >> Hi Osma,
>> >>
>> >> I understand what you are saying. There are ways to mitigate risks and
>> >> balance the refactoring without affecting the existing modules. But I
>> will
>> >> not delve into those now. I am not an expert in Jena to convincingly
>> say
>> >> that it is possible, without any hiccups. But I can take a guess and
>> say
>> >> that it is indeed possible :)
>> >>
>> >> For the question: "is it even possible to mix modules that depend on
>> >> different versions of the Lucene libraries within the same project?"
>> >>
>> >> I actually do not understand what you mean by mixing modules. I assume
>> you
>> >> mean having jena-text and jena-text-es as dependencies in a build
>> without
>> >> causing the build to conflict. If that is what you mean than the
>> answer is
>> >> yes it is possible and quite simple as well. Let me explain how it is
>> >> possible. But before that some assumption which I want to call out
>> >> explicitly.
>> >>
>> >> *Assumption:*
>> >> 1. At a given point in time, only a single Indexing Technology is used
>> for
>> >> text based indexing and searching via Jean. What this means is that we
>> will
>> >> either use Lucene Implementation OR Solr Implementation OR ES
>> >> Implementation at any given point in time.
>> >> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
>> but
>> >> only on jena-text classes, if at all.
>> >>
>> >> Based on these assumptions it is possible to create a build that
>> contains
>> >> jena-text based common classes + ES specific classes without any
>> >> compatibility issues. And it is infact quite simple. I did it in the
>> >> current jena-text-es module and ran the entire build which succeeded.
>> >> The key is to include the latest Lucene dependencies at the very
>> beginning
>> >> in the pom and then include jena-text dependency. Maven will then
>> >> automatically resolve the dependency issues by including the Lucene
>> >> librarires that we included in our es specific pom. Have a look the
>> pom of
>> >> jena-text-es module here to see how it can be done :
>> >> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>> >>
>> >>
>> >> Thanks,
>> >> Anuj Kumar
>> >>
>> >>
>> >> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
>> osma.suominen@helsinki.fi>
>> >> wrote:
>> >>
>> >>> Hi Anuj,
>> >>>
>> >>> I understand your concerns. However, we also need to balance between
>> the
>> >>> needs of individual modules/features and the whole codebase. I'm
>> willing to
>> >>> put in the effort to keep the other modules up to date with newer
>> Lucene
>> >>> versions. Lucene upgrade requirements are well documented, the only
>> hitches
>> >>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>> >>> features that were dropped from newer versions.
>> >>>
>> >>> A perhaps stupid question to more experienced Java developers: is it
>> even
>> >>> possible to mix modules that depend on different versions of the
>> Lucene
>> >>> libraries within the same project? In my (quite limited)
>> understanding of
>> >>> Java projects and libraries, this requires special arrangements (e.g.
>> >>> shading) as the Java package/class namespace is shared by all the code
>> >>> running within the same JVM.
>> >>>
>> >>> So can you create, say, a Fuseki build that contains the current
>> jena-text
>> >>> module (depending on Lucene 4.x) and the new jena-text-es module
>> (depending
>> >>> on Lucene 6.4.1) without any compatibility issues?
>> >>>
>> >>> -Osma
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> My 2 Cents :
>> >>>>
>> >>>> The reason I proposed to have separate modules for Lucene, Solr and
>> ES is
>> >>>> exactly for avoiding the "All or Nothing" approach we need to take
>> if we
>> >>>> club them all together. If they stay together and if in the near
>> future I
>> >>>> want to upgrade ES to another version, I also need to again upgrade
>> Lucene
>> >>>> and Solr and possibly another implementation that may have been added
>> >>>> during the time. As we all know, this means weeks of work if not
>> months to
>> >>>> get the changes released. This will personally de-motivate me to do
>> >>>> anything and I will probably start maintaining my version of
>> Jena-Text as
>> >>>> that would be much simpler to do than to upgrade and test and in the
>> >>>> process own(read fix bugs) the upgrade for each and every technology.
>> >>>>
>> >>>> If they are developed as separate modules, they can evolve
>> independently
>> >>>> of
>> >>>> each other and we can avoid situations where we cant upgrade to
>> latest
>> >>>> version of Lucene because we do not know what effect it will have on
>> Solr
>> >>>> Implementation.
>> >>>>
>> >>>> We can start with having a separate Module for Jena Text ES and see
>> how
>> >>>> things go. If they go well, we could extract out Solr and Lucene out
>> of
>> >>>> Jena Text.
>> >>>>
>> >>>> Again this is just a suggestion based on my limited industry
>> experience.
>> >>>>
>> >>>> Thanks,
>> >>>> Anuj Kumar
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
>> osma.suominen@helsinki.fi
>> >>>>>
>> >>>> wrote:
>> >>>>
>> >>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>> >>>>>
>> >>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>> >>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apa
>> che.org%3E
>> >>>>>> ? In other words, might it be better to factor out between -text
>> and
>> >>>>>> -spatial and _then_ try to upgrade the Lucene version?
>> >>>>>>
>> >>>>>>
>> >>>>> I certainly wouldn't object to that, but somebody has to volunteer
>> to do
>> >>>>> the actual work!
>> >>>>>
>> >>>>> I don't use the Solr component now, but I could easily see so
>> doing...
>> >>>>>
>> >>>>>> that's pretty vague, I know, and I'm not in a position to do any
>> work to
>> >>>>>> maintain it, so consider that just a very small and blurry data
>> point.
>> >>>>>> :)
>> >>>>>>
>> >>>>>>
>> >>>>> Last time I tried it (it was a while ago) I couldn't figure out how
>> to
>> >>>>> get
>> >>>>> it running... If you could just try that with some toy data, then
>> your
>> >>>>> data
>> >>>>> point would be a lot less blurry :) I haven't used Solr for
>> anything, so
>> >>>>> I'm not very familiar with how to set it up, and the jena-text
>> >>>>> instructions
>> >>>>> are pretty vague unfortunately.
>> >>>>>
>> >>>>>
>> >>>>> -Osma
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Osma Suominen
>> >>>>> D.Sc. (Tech), Information Systems Specialist
>> >>>>> National Library of Finland
>> >>>>> P.O. Box 26 (Kaikukatu 4)
>> >>>>> 00014 HELSINGIN YLIOPISTO
>> >>>>> Tel. +358 50 3199529
>> >>>>> osma.suominen@helsinki.fi
>> >>>>> http://www.nationallibrary.fi
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Osma Suominen
>> >>> D.Sc. (Tech), Information Systems Specialist
>> >>> National Library of Finland
>> >>> P.O. Box 26 (Kaikukatu 4)
>> >>> 00014 HELSINGIN YLIOPISTO
>> >>> Tel. +358 50 3199529
>> >>> osma.suominen@helsinki.fi
>> >>> http://www.nationallibrary.fi
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Osma Suominen
>> > D.Sc. (Tech), Information Systems Specialist
>> > National Library of Finland
>> > P.O. Box 26 (Kaikukatu 4)
>> > 00014 HELSINGIN YLIOPISTO
>> > Tel. +358 50 3199529
>> > osma.suominen@helsinki.fi
>> > http://www.nationallibrary.fi
>>
>>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.

I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.

Cheers!
Anuj Kumar


On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote:

> Osma--
>
> The short answer is that yes, given the right tools you _can_ have
> different versions of code accessible in different ways. The longer answer
> is that it's probably not a viable alternative for Jena for this problem,
> at least not without a lot of other change.
>
> You are right to point to the classloader mechanism as being at the heart
> of this question, but I must alter your remark just slightly. From "the
> Java classloader only sees a single, flat package/class namespace and a set
> of compiled classes" to "ANY GIVEN Java classloader only sees a single,
> flat package/class namespace and a set of compiled classes".
>
> This is the fact that OSGi uses to make it possible to maintain strict
> module boundaries (and even dynamic module relationships at run-time). Each
> OSGi bundle sees its own classloader, and the framework is responsible for
> connecting bundles up to ensure that every bundle has what it needs in the
> way of types to function, based on metadata that the bundles provide to the
> framework. It's an incredibly powerful system (I use it every day and enjoy
> it enormously) but it's also very "heavy" and requires a good deal of
> investment to use. In particular, it's probably too large to put _inside_
> Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
>
> Java 9 Jigsaw [1] offers some possibility for strong modularization of
> this kind, but it's really meant for the JDK itself, not application
> libraries. In theory, we could "roll our own" classloader management for
> this problem. That sounds like more than a bit of a rabbit hole to me.
> There might be another, more lightweight, toolkit out there to this
> purpose, but I'm not aware of any myself.
>
> Otherwise, yes, you get into shading and the like. We have to do that for
> Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
> thing we want to do any more of than needed, I don't think.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> [1] http://openjdk.java.net/projects/jigsaw/
>
> > On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
> >
> > Hi Anuj!
> >
> > Thanks for the clarification.
> >
> > However, I'm still not sure I understand the situation completely. I
> know Maven can perform a lot of tricks, but Maven modules are just
> convenient ways to structure a Java project. Maven cannot change the fact
> that at runtime, module divisions don't really matter (except that they
> usually correspond to package sub-namespaces) and the Java classloader only
> sees a single, flat package/class namespace and a set of compiled classes
> (usually within JARs) in the classpath that it needs to check to find the
> right classes, and if there are two versions of the same library (eg
> Lucene) with overlapping class names, that's going to cause trouble. The
> only way around that is to shade some of the libraries, i.e. rename them so
> that they end up in another, non-conflicting namespace. Apparently
> Elasticsearch also did some of that in the past [1] but nowadays tries to
> avoid it.
> >
> > Does your assumption 1 ("At a given point in time, only a single
> Indexing Technology is used") imply that in the assembler configuration,
> you cannot have ja:loadClass declarations for both Lucene and ES backends?
> Or how do you run something like Fuseki that contains (in a single big JAR)
> both the jena-text and jena-text-es modules with all their dependencies,
> one of which requires the Lucene 4.x classes and the other one the Lucene
> 6.4.1 classes? How do you ensure that only one of them is used at a time,
> and that the Java classloader, even though it has access to both versions
> of Lucene, only loads classes from the single, correct one and not the
> other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
> packages, so that you don't end up with two Lucene versions within the same
> Fuseki JAR?
> >
> > -Osma
> >
> > [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
> >
> > 01.03.2017, 11:03, anuj kumar kirjoitti:
> >> Hi Osma,
> >>
> >> I understand what you are saying. There are ways to mitigate risks and
> >> balance the refactoring without affecting the existing modules. But I
> will
> >> not delve into those now. I am not an expert in Jena to convincingly say
> >> that it is possible, without any hiccups. But I can take a guess and say
> >> that it is indeed possible :)
> >>
> >> For the question: "is it even possible to mix modules that depend on
> >> different versions of the Lucene libraries within the same project?"
> >>
> >> I actually do not understand what you mean by mixing modules. I assume
> you
> >> mean having jena-text and jena-text-es as dependencies in a build
> without
> >> causing the build to conflict. If that is what you mean than the answer
> is
> >> yes it is possible and quite simple as well. Let me explain how it is
> >> possible. But before that some assumption which I want to call out
> >> explicitly.
> >>
> >> *Assumption:*
> >> 1. At a given point in time, only a single Indexing Technology is used
> for
> >> text based indexing and searching via Jean. What this means is that we
> will
> >> either use Lucene Implementation OR Solr Implementation OR ES
> >> Implementation at any given point in time.
> >> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
> >> only on jena-text classes, if at all.
> >>
> >> Based on these assumptions it is possible to create a build that
> contains
> >> jena-text based common classes + ES specific classes without any
> >> compatibility issues. And it is infact quite simple. I did it in the
> >> current jena-text-es module and ran the entire build which succeeded.
> >> The key is to include the latest Lucene dependencies at the very
> beginning
> >> in the pom and then include jena-text dependency. Maven will then
> >> automatically resolve the dependency issues by including the Lucene
> >> librarires that we included in our es specific pom. Have a look the pom
> of
> >> jena-text-es module here to see how it can be done :
> >> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
> >>
> >>
> >> Thanks,
> >> Anuj Kumar
> >>
> >>
> >> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
> osma.suominen@helsinki.fi>
> >> wrote:
> >>
> >>> Hi Anuj,
> >>>
> >>> I understand your concerns. However, we also need to balance between
> the
> >>> needs of individual modules/features and the whole codebase. I'm
> willing to
> >>> put in the effort to keep the other modules up to date with newer
> Lucene
> >>> versions. Lucene upgrade requirements are well documented, the only
> hitches
> >>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
> >>> features that were dropped from newer versions.
> >>>
> >>> A perhaps stupid question to more experienced Java developers: is it
> even
> >>> possible to mix modules that depend on different versions of the Lucene
> >>> libraries within the same project? In my (quite limited) understanding
> of
> >>> Java projects and libraries, this requires special arrangements (e.g.
> >>> shading) as the Java package/class namespace is shared by all the code
> >>> running within the same JVM.
> >>>
> >>> So can you create, say, a Fuseki build that contains the current
> jena-text
> >>> module (depending on Lucene 4.x) and the new jena-text-es module
> (depending
> >>> on Lucene 6.4.1) without any compatibility issues?
> >>>
> >>> -Osma
> >>>
> >>>
> >>>
> >>>
> >>> 01.03.2017, 00:47, anuj kumar kirjoitti:
> >>>
> >>>> Hi,
> >>>>
> >>>> My 2 Cents :
> >>>>
> >>>> The reason I proposed to have separate modules for Lucene, Solr and
> ES is
> >>>> exactly for avoiding the "All or Nothing" approach we need to take if
> we
> >>>> club them all together. If they stay together and if in the near
> future I
> >>>> want to upgrade ES to another version, I also need to again upgrade
> Lucene
> >>>> and Solr and possibly another implementation that may have been added
> >>>> during the time. As we all know, this means weeks of work if not
> months to
> >>>> get the changes released. This will personally de-motivate me to do
> >>>> anything and I will probably start maintaining my version of
> Jena-Text as
> >>>> that would be much simpler to do than to upgrade and test and in the
> >>>> process own(read fix bugs) the upgrade for each and every technology.
> >>>>
> >>>> If they are developed as separate modules, they can evolve
> independently
> >>>> of
> >>>> each other and we can avoid situations where we cant upgrade to latest
> >>>> version of Lucene because we do not know what effect it will have on
> Solr
> >>>> Implementation.
> >>>>
> >>>> We can start with having a separate Module for Jena Text ES and see
> how
> >>>> things go. If they go well, we could extract out Solr and Lucene out
> of
> >>>> Jena Text.
> >>>>
> >>>> Again this is just a suggestion based on my limited industry
> experience.
> >>>>
> >>>> Thanks,
> >>>> Anuj Kumar
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
> osma.suominen@helsinki.fi
> >>>>>
> >>>> wrote:
> >>>>
> >>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
> >>>>>
> >>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
> >>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org
> %3E
> >>>>>> ? In other words, might it be better to factor out between -text and
> >>>>>> -spatial and _then_ try to upgrade the Lucene version?
> >>>>>>
> >>>>>>
> >>>>> I certainly wouldn't object to that, but somebody has to volunteer
> to do
> >>>>> the actual work!
> >>>>>
> >>>>> I don't use the Solr component now, but I could easily see so
> doing...
> >>>>>
> >>>>>> that's pretty vague, I know, and I'm not in a position to do any
> work to
> >>>>>> maintain it, so consider that just a very small and blurry data
> point.
> >>>>>> :)
> >>>>>>
> >>>>>>
> >>>>> Last time I tried it (it was a while ago) I couldn't figure out how
> to
> >>>>> get
> >>>>> it running... If you could just try that with some toy data, then
> your
> >>>>> data
> >>>>> point would be a lot less blurry :) I haven't used Solr for
> anything, so
> >>>>> I'm not very familiar with how to set it up, and the jena-text
> >>>>> instructions
> >>>>> are pretty vague unfortunately.
> >>>>>
> >>>>>
> >>>>> -Osma
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Osma Suominen
> >>>>> D.Sc. (Tech), Information Systems Specialist
> >>>>> National Library of Finland
> >>>>> P.O. Box 26 (Kaikukatu 4)
> >>>>> 00014 HELSINGIN YLIOPISTO
> >>>>> Tel. +358 50 3199529
> >>>>> osma.suominen@helsinki.fi
> >>>>> http://www.nationallibrary.fi
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Osma Suominen
> >>> D.Sc. (Tech), Information Systems Specialist
> >>> National Library of Finland
> >>> P.O. Box 26 (Kaikukatu 4)
> >>> 00014 HELSINGIN YLIOPISTO
> >>> Tel. +358 50 3199529
> >>> osma.suominen@helsinki.fi
> >>> http://www.nationallibrary.fi
> >>>
> >>
> >>
> >>
> >
> >
> > --
> > Osma Suominen
> > D.Sc. (Tech), Information Systems Specialist
> > National Library of Finland
> > P.O. Box 26 (Kaikukatu 4)
> > 00014 HELSINGIN YLIOPISTO
> > Tel. +358 50 3199529
> > osma.suominen@helsinki.fi
> > http://www.nationallibrary.fi
>
>


-- 
*Anuj Kumar*

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by "A. Soroka" <aj...@virginia.edu>.
Osma--

The short answer is that yes, given the right tools you _can_ have different versions of code accessible in different ways. The longer answer is that it's probably not a viable alternative for Jena for this problem, at least not without a lot of other change.

You are right to point to the classloader mechanism as being at the heart of this question, but I must alter your remark just slightly. From "the Java classloader only sees a single, flat package/class namespace and a set of compiled classes" to "ANY GIVEN Java classloader only sees a single, flat package/class namespace and a set of compiled classes".

This is the fact that OSGi uses to make it possible to maintain strict module boundaries (and even dynamic module relationships at run-time). Each OSGi bundle sees its own classloader, and the framework is responsible for connecting bundles up to ensure that every bundle has what it needs in the way of types to function, based on metadata that the bundles provide to the framework. It's an incredibly powerful system (I use it every day and enjoy it enormously) but it's also very "heavy" and requires a good deal of investment to use. In particular, it's probably too large to put _inside_ Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)

Java 9 Jigsaw [1] offers some possibility for strong modularization of this kind, but it's really meant for the JDK itself, not application libraries. In theory, we could "roll our own" classloader management for this problem. That sounds like more than a bit of a rabbit hole to me. There might be another, more lightweight, toolkit out there to this purpose, but I'm not aware of any myself. 

Otherwise, yes, you get into shading and the like. We have to do that for Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a thing we want to do any more of than needed, I don't think.

---
A. Soroka
The University of Virginia Library

[1] http://openjdk.java.net/projects/jigsaw/

> On Mar 1, 2017, at 9:03 AM, Osma Suominen <os...@helsinki.fi> wrote:
> 
> Hi Anuj!
> 
> Thanks for the clarification.
> 
> However, I'm still not sure I understand the situation completely. I know Maven can perform a lot of tricks, but Maven modules are just convenient ways to structure a Java project. Maven cannot change the fact that at runtime, module divisions don't really matter (except that they usually correspond to package sub-namespaces) and the Java classloader only sees a single, flat package/class namespace and a set of compiled classes (usually within JARs) in the classpath that it needs to check to find the right classes, and if there are two versions of the same library (eg Lucene) with overlapping class names, that's going to cause trouble. The only way around that is to shade some of the libraries, i.e. rename them so that they end up in another, non-conflicting namespace. Apparently Elasticsearch also did some of that in the past [1] but nowadays tries to avoid it.
> 
> Does your assumption 1 ("At a given point in time, only a single Indexing Technology is used") imply that in the assembler configuration, you cannot have ja:loadClass declarations for both Lucene and ES backends? Or how do you run something like Fuseki that contains (in a single big JAR) both the jena-text and jena-text-es modules with all their dependencies, one of which requires the Lucene 4.x classes and the other one the Lucene 6.4.1 classes? How do you ensure that only one of them is used at a time, and that the Java classloader, even though it has access to both versions of Lucene, only loads classes from the single, correct one and not the other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" packages, so that you don't end up with two Lucene versions within the same Fuseki JAR?
> 
> -Osma
> 
> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade
> 
> 01.03.2017, 11:03, anuj kumar kirjoitti:
>> Hi Osma,
>> 
>> I understand what you are saying. There are ways to mitigate risks and
>> balance the refactoring without affecting the existing modules. But I will
>> not delve into those now. I am not an expert in Jena to convincingly say
>> that it is possible, without any hiccups. But I can take a guess and say
>> that it is indeed possible :)
>> 
>> For the question: "is it even possible to mix modules that depend on
>> different versions of the Lucene libraries within the same project?"
>> 
>> I actually do not understand what you mean by mixing modules. I assume you
>> mean having jena-text and jena-text-es as dependencies in a build without
>> causing the build to conflict. If that is what you mean than the answer is
>> yes it is possible and quite simple as well. Let me explain how it is
>> possible. But before that some assumption which I want to call out
>> explicitly.
>> 
>> *Assumption:*
>> 1. At a given point in time, only a single Indexing Technology is used for
>> text based indexing and searching via Jean. What this means is that we will
>> either use Lucene Implementation OR Solr Implementation OR ES
>> Implementation at any given point in time.
>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
>> only on jena-text classes, if at all.
>> 
>> Based on these assumptions it is possible to create a build that contains
>> jena-text based common classes + ES specific classes without any
>> compatibility issues. And it is infact quite simple. I did it in the
>> current jena-text-es module and ran the entire build which succeeded.
>> The key is to include the latest Lucene dependencies at the very beginning
>> in the pom and then include jena-text dependency. Maven will then
>> automatically resolve the dependency issues by including the Lucene
>> librarires that we included in our es specific pom. Have a look the pom of
>> jena-text-es module here to see how it can be done :
>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>> 
>> 
>> Thanks,
>> Anuj Kumar
>> 
>> 
>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <os...@helsinki.fi>
>> wrote:
>> 
>>> Hi Anuj,
>>> 
>>> I understand your concerns. However, we also need to balance between the
>>> needs of individual modules/features and the whole codebase. I'm willing to
>>> put in the effort to keep the other modules up to date with newer Lucene
>>> versions. Lucene upgrade requirements are well documented, the only hitches
>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>>> features that were dropped from newer versions.
>>> 
>>> A perhaps stupid question to more experienced Java developers: is it even
>>> possible to mix modules that depend on different versions of the Lucene
>>> libraries within the same project? In my (quite limited) understanding of
>>> Java projects and libraries, this requires special arrangements (e.g.
>>> shading) as the Java package/class namespace is shared by all the code
>>> running within the same JVM.
>>> 
>>> So can you create, say, a Fuseki build that contains the current jena-text
>>> module (depending on Lucene 4.x) and the new jena-text-es module (depending
>>> on Lucene 6.4.1) without any compatibility issues?
>>> 
>>> -Osma
>>> 
>>> 
>>> 
>>> 
>>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>> 
>>>> Hi,
>>>> 
>>>> My 2 Cents :
>>>> 
>>>> The reason I proposed to have separate modules for Lucene, Solr and ES is
>>>> exactly for avoiding the "All or Nothing" approach we need to take if we
>>>> club them all together. If they stay together and if in the near future I
>>>> want to upgrade ES to another version, I also need to again upgrade Lucene
>>>> and Solr and possibly another implementation that may have been added
>>>> during the time. As we all know, this means weeks of work if not months to
>>>> get the changes released. This will personally de-motivate me to do
>>>> anything and I will probably start maintaining my version of Jena-Text as
>>>> that would be much simpler to do than to upgrade and test and in the
>>>> process own(read fix bugs) the upgrade for each and every technology.
>>>> 
>>>> If they are developed as separate modules, they can evolve independently
>>>> of
>>>> each other and we can avoid situations where we cant upgrade to latest
>>>> version of Lucene because we do not know what effect it will have on Solr
>>>> Implementation.
>>>> 
>>>> We can start with having a separate Module for Jena Text ES and see how
>>>> things go. If they go well, we could extract out Solr and Lucene out of
>>>> Jena Text.
>>>> 
>>>> Again this is just a suggestion based on my limited industry experience.
>>>> 
>>>> Thanks,
>>>> Anuj Kumar
>>>> 
>>>> 
>>>> 
>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <osma.suominen@helsinki.fi
>>>>> 
>>>> wrote:
>>>> 
>>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>> 
>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E
>>>>>> ? In other words, might it be better to factor out between -text and
>>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>> 
>>>>>> 
>>>>> I certainly wouldn't object to that, but somebody has to volunteer to do
>>>>> the actual work!
>>>>> 
>>>>> I don't use the Solr component now, but I could easily see so doing...
>>>>> 
>>>>>> that's pretty vague, I know, and I'm not in a position to do any work to
>>>>>> maintain it, so consider that just a very small and blurry data point.
>>>>>> :)
>>>>>> 
>>>>>> 
>>>>> Last time I tried it (it was a while ago) I couldn't figure out how to
>>>>> get
>>>>> it running... If you could just try that with some toy data, then your
>>>>> data
>>>>> point would be a lot less blurry :) I haven't used Solr for anything, so
>>>>> I'm not very familiar with how to set it up, and the jena-text
>>>>> instructions
>>>>> are pretty vague unfortunately.
>>>>> 
>>>>> 
>>>>> -Osma
>>>>> 
>>>>> 
>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>> 
>> 
>> 
>> 
> 
> 
> -- 
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi


Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Anuj!

Thanks for the clarification.

However, I'm still not sure I understand the situation completely. I 
know Maven can perform a lot of tricks, but Maven modules are just 
convenient ways to structure a Java project. Maven cannot change the 
fact that at runtime, module divisions don't really matter (except that 
they usually correspond to package sub-namespaces) and the Java 
classloader only sees a single, flat package/class namespace and a set 
of compiled classes (usually within JARs) in the classpath that it needs 
to check to find the right classes, and if there are two versions of the 
same library (eg Lucene) with overlapping class names, that's going to 
cause trouble. The only way around that is to shade some of the 
libraries, i.e. rename them so that they end up in another, 
non-conflicting namespace. Apparently Elasticsearch also did some of 
that in the past [1] but nowadays tries to avoid it.

Does your assumption 1 ("At a given point in time, only a single 
Indexing Technology is used") imply that in the assembler configuration, 
you cannot have ja:loadClass declarations for both Lucene and ES 
backends? Or how do you run something like Fuseki that contains (in a 
single big JAR) both the jena-text and jena-text-es modules with all 
their dependencies, one of which requires the Lucene 4.x classes and the 
other one the Lucene 6.4.1 classes? How do you ensure that only one of 
them is used at a time, and that the Java classloader, even though it 
has access to both versions of Lucene, only loads classes from the 
single, correct one and not the other? Or do you need to have separate 
"Fuseki-Lucene" and "Fuseki-ES" packages, so that you don't end up with 
two Lucene versions within the same Fuseki JAR?

-Osma

[1] https://www.elastic.co/blog/to-shade-or-not-to-shade

01.03.2017, 11:03, anuj kumar kirjoitti:
> Hi Osma,
>
> I understand what you are saying. There are ways to mitigate risks and
> balance the refactoring without affecting the existing modules. But I will
> not delve into those now. I am not an expert in Jena to convincingly say
> that it is possible, without any hiccups. But I can take a guess and say
> that it is indeed possible :)
>
> For the question: "is it even possible to mix modules that depend on
> different versions of the Lucene libraries within the same project?"
>
> I actually do not understand what you mean by mixing modules. I assume you
> mean having jena-text and jena-text-es as dependencies in a build without
> causing the build to conflict. If that is what you mean than the answer is
> yes it is possible and quite simple as well. Let me explain how it is
> possible. But before that some assumption which I want to call out
> explicitly.
>
> *Assumption:*
> 1. At a given point in time, only a single Indexing Technology is used for
> text based indexing and searching via Jean. What this means is that we will
> either use Lucene Implementation OR Solr Implementation OR ES
> Implementation at any given point in time.
> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
> only on jena-text classes, if at all.
>
> Based on these assumptions it is possible to create a build that contains
> jena-text based common classes + ES specific classes without any
> compatibility issues. And it is infact quite simple. I did it in the
> current jena-text-es module and ran the entire build which succeeded.
> The key is to include the latest Lucene dependencies at the very beginning
> in the pom and then include jena-text dependency. Maven will then
> automatically resolve the dependency issues by including the Lucene
> librarires that we included in our es specific pom. Have a look the pom of
> jena-text-es module here to see how it can be done :
> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
>
>
> Thanks,
> Anuj Kumar
>
>
> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <os...@helsinki.fi>
> wrote:
>
>> Hi Anuj,
>>
>> I understand your concerns. However, we also need to balance between the
>> needs of individual modules/features and the whole codebase. I'm willing to
>> put in the effort to keep the other modules up to date with newer Lucene
>> versions. Lucene upgrade requirements are well documented, the only hitches
>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
>> features that were dropped from newer versions.
>>
>> A perhaps stupid question to more experienced Java developers: is it even
>> possible to mix modules that depend on different versions of the Lucene
>> libraries within the same project? In my (quite limited) understanding of
>> Java projects and libraries, this requires special arrangements (e.g.
>> shading) as the Java package/class namespace is shared by all the code
>> running within the same JVM.
>>
>> So can you create, say, a Fuseki build that contains the current jena-text
>> module (depending on Lucene 4.x) and the new jena-text-es module (depending
>> on Lucene 6.4.1) without any compatibility issues?
>>
>> -Osma
>>
>>
>>
>>
>> 01.03.2017, 00:47, anuj kumar kirjoitti:
>>
>>> Hi,
>>>
>>> My 2 Cents :
>>>
>>>  The reason I proposed to have separate modules for Lucene, Solr and ES is
>>> exactly for avoiding the "All or Nothing" approach we need to take if we
>>> club them all together. If they stay together and if in the near future I
>>> want to upgrade ES to another version, I also need to again upgrade Lucene
>>> and Solr and possibly another implementation that may have been added
>>> during the time. As we all know, this means weeks of work if not months to
>>> get the changes released. This will personally de-motivate me to do
>>> anything and I will probably start maintaining my version of Jena-Text as
>>> that would be much simpler to do than to upgrade and test and in the
>>> process own(read fix bugs) the upgrade for each and every technology.
>>>
>>> If they are developed as separate modules, they can evolve independently
>>> of
>>> each other and we can avoid situations where we cant upgrade to latest
>>> version of Lucene because we do not know what effect it will have on Solr
>>> Implementation.
>>>
>>> We can start with having a separate Module for Jena Text ES and see how
>>> things go. If they go well, we could extract out Solr and Lucene out of
>>> Jena Text.
>>>
>>> Again this is just a suggestion based on my limited industry experience.
>>>
>>> Thanks,
>>> Anuj Kumar
>>>
>>>
>>>
>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <osma.suominen@helsinki.fi
>>>>
>>> wrote:
>>>
>>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>>
>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E
>>>>> ? In other words, might it be better to factor out between -text and
>>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>>
>>>>>
>>>> I certainly wouldn't object to that, but somebody has to volunteer to do
>>>> the actual work!
>>>>
>>>> I don't use the Solr component now, but I could easily see so doing...
>>>>
>>>>> that's pretty vague, I know, and I'm not in a position to do any work to
>>>>> maintain it, so consider that just a very small and blurry data point.
>>>>> :)
>>>>>
>>>>>
>>>> Last time I tried it (it was a while ago) I couldn't figure out how to
>>>> get
>>>> it running... If you could just try that with some toy data, then your
>>>> data
>>>> point would be a lot less blurry :) I haven't used Solr for anything, so
>>>> I'm not very familiar with how to set it up, and the jena-text
>>>> instructions
>>>> are pretty vague unfortunately.
>>>>
>>>>
>>>> -Osma
>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine

Posted by anuj kumar <an...@gmail.com>.
Hi Osma,

I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)

For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"

I actually do not understand what you mean by mixing modules. I assume you
mean having jena-text and jena-text-es as dependencies in a build without
causing the build to conflict. If that is what you mean than the answer is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.

*Assumption:*
1. At a given point in time, only a single Indexing Technology is used for
text based indexing and searching via Jean. What this means is that we will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.

Based on these assumptions it is possible to create a build that contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom of
jena-text-es module here to see how it can be done :
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml


Thanks,
Anuj Kumar


On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <os...@helsinki.fi>
wrote:

> Hi Anuj,
>
> I understand your concerns. However, we also need to balance between the
> needs of individual modules/features and the whole codebase. I'm willing to
> put in the effort to keep the other modules up to date with newer Lucene
> versions. Lucene upgrade requirements are well documented, the only hitches
> seen in JENA-1250 were related to how jena-text (ab)used some Lucene
> features that were dropped from newer versions.
>
> A perhaps stupid question to more experienced Java developers: is it even
> possible to mix modules that depend on different versions of the Lucene
> libraries within the same project? In my (quite limited) understanding of
> Java projects and libraries, this requires special arrangements (e.g.
> shading) as the Java package/class namespace is shared by all the code
> running within the same JVM.
>
> So can you create, say, a Fuseki build that contains the current jena-text
> module (depending on Lucene 4.x) and the new jena-text-es module (depending
> on Lucene 6.4.1) without any compatibility issues?
>
> -Osma
>
>
>
>
> 01.03.2017, 00:47, anuj kumar kirjoitti:
>
>> Hi,
>>
>> My 2 Cents :
>>
>>  The reason I proposed to have separate modules for Lucene, Solr and ES is
>> exactly for avoiding the "All or Nothing" approach we need to take if we
>> club them all together. If they stay together and if in the near future I
>> want to upgrade ES to another version, I also need to again upgrade Lucene
>> and Solr and possibly another implementation that may have been added
>> during the time. As we all know, this means weeks of work if not months to
>> get the changes released. This will personally de-motivate me to do
>> anything and I will probably start maintaining my version of Jena-Text as
>> that would be much simpler to do than to upgrade and test and in the
>> process own(read fix bugs) the upgrade for each and every technology.
>>
>> If they are developed as separate modules, they can evolve independently
>> of
>> each other and we can avoid situations where we cant upgrade to latest
>> version of Lucene because we do not know what effect it will have on Solr
>> Implementation.
>>
>> We can start with having a separate Module for Jena Text ES and see how
>> things go. If they go well, we could extract out Solr and Lucene out of
>> Jena Text.
>>
>> Again this is just a suggestion based on my limited industry experience.
>>
>> Thanks,
>> Anuj Kumar
>>
>>
>>
>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <osma.suominen@helsinki.fi
>> >
>> wrote:
>>
>> 28.02.2017, 17:12, A. Soroka kirjoitti:
>>>
>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E
>>>> ? In other words, might it be better to factor out between -text and
>>>> -spatial and _then_ try to upgrade the Lucene version?
>>>>
>>>>
>>> I certainly wouldn't object to that, but somebody has to volunteer to do
>>> the actual work!
>>>
>>> I don't use the Solr component now, but I could easily see so doing...
>>>
>>>> that's pretty vague, I know, and I'm not in a position to do any work to
>>>> maintain it, so consider that just a very small and blurry data point.
>>>> :)
>>>>
>>>>
>>> Last time I tried it (it was a while ago) I couldn't figure out how to
>>> get
>>> it running... If you could just try that with some toy data, then your
>>> data
>>> point would be a lot less blurry :) I haven't used Solr for anything, so
>>> I'm not very familiar with how to set it up, and the jena-text
>>> instructions
>>> are pretty vague unfortunately.
>>>
>>>
>>> -Osma
>>>
>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>>
>>>
>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
*Anuj Kumar*