You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Robert Vesse <rv...@yarcdata.com> on 2012/05/25 18:49:49 UTC

LARQ (and other Fuseki extensions) Integration Strategy

In the context of the discussion of next releases maybe this is a good time to revisit the topic of LARQ and its integration with Fuseki

As I understand it right now the way to deploy LARQ is to modify the Fuseki POM to pull in the LARQ dependency so that one can built a modified Fuseki uber jar with the LARQ and Lucene features enabled?  Paolo please correct me if I am mistaken

In the interests of extensibility maybe it would be better to move to a model that allow for easier drop in extension of Fuseki.  What I'm thinking is that you have an empty lib directory in the Fuseki distribution and then rather than having users call the JAR directly you have a simple script that invokes the JAR and instantiates the class path to be lib/* thereby picking up any drop in extensions users have

It may even be sensible to move the Fuseki JAR into that lib directory and just invoke java with a class path of lib/* and call out the main class FusekiCmd explicitly and not use the –jar argument at all e.g.

#!/bin/bash

Java –cp lib/* org.apache.jena.fuseki.FusekiCmd $*

Forgive me if the above syntax is not exactly correct and you'd maybe want to do some JAVA_HOME shenanigans in there but hopefully you get the general idea.

Is this a sensible suggestion?

It seems like it would make it easier to drop in proposed future extensions like GeoSPARQL without getting into users having to recompile the code themselves.

Rob

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Andy Seaborne <an...@apache.org>.

On 28/05/12 12:35, Paolo Castagna wrote:
> Andy Seaborne wrote:
>> On 25/05/12 19:27, Paolo Castagna wrote:
>>> Hi Rob
>>>
>>> Robert Vesse wrote:
>>>> There's nothing to stop us making LARQ generate an uber-jar in the same
>>>> way we do for Fuseki with maven shade is there?
>>>>
>>>> This would be sufficient for users if they only had to drop in a single
>>>> extra JAR into a directory - most users can manage that ;-)
>>>
>>> My 0 hasn't flipped to +1, but if you need that feel free to go ahead.
>>>
>>>> Yes that is my concern that we're making users jump through a lot of
>>>> hoops
>>>> to extend things
>>>
>>> Yep.
>>>
>>>> I thought assemblers can be used to get arbitrary classes loaded so can
>>>> you not theoretically have some initializer class for the plugin
>>>> (whether
>>>> LARQ or otherwise) and add lines to the assembler to get it loaded.
>>>>
>>>> It seems like assemblers would solve other current problems with LARQ
>>>> such
>>>> as the ability to specify custom analyzers, index directories etc.
>>>> Doing
>>>> everything via assembler seems the logical way to go for plugins like
>>>> LARQ
>>>
>>> I am by no means an Assembler expert and perhaps you are right. I hope
>>> so.
>>>
>>> If I remember correctly the problem I had was that in order to build a
>>> LARQ
>>> index, I needed a Dataset, but I did not find a way to get a reference
>>> to an
>>> object created by another assembler module from the LARQ ones. I hope I
>>> explained this clearly enough.
>>>
>>> Paolo
>>
>> +1 to assemblers.
>>
>> And in general, the "drop jars into a dir" idea works very well with
>> assemblers because
>>
>> [] ja:loadClass "x.y.z.someclass" .
>>
>> mechanism also invokes a static init() method so the code gets a chance
>> to plug into the system during startup.
>>
>> Paolo - there are various examples of assembler code in the codebase
>> e.g. TDB :-)
>
> Hi Andy,
> good, thanks for the +1 on assemblers.
>
> The thing that I struggled with is how to get a reference to a Dataset (built by
> an assembler in TDB or other systems, SDB) when I need to build an IndexLARQ.
>
> I need a reference to a Dataset since I want to register a listener to be
> notified as new triples|quads are added/removed (and update Lucene index
> accordingly).
>
>    Dataset dataset = ... ;
>    IndexBuilderModel larqBuilder = new IndexBuilderString(indexWriter) ;
>    dataset.getDefaultModel().register(larqBuilder);
>    for ( Iterator<String>  iter = dataset.listNames() ; iter.hasNext() ; ) {
>        String g = iter.next() ;
>        dataset.getNamedModel(g).register(larqBuilder) ;
>    }
>
> This was the reasons of the 'pollution' and call to make(...) via reflection in
> ARQ's DatasetAssembler.
>
> Is there an example of an assembler which uses and/or get a reference to an
> object build elsewhere via another assembler?

ARQ's general dataset may help - it has embedded assemblers for each 
graph.  It happens to use assembler.openModel -- you'll need the general 
mechanism using a cast of:

	(Thing)Assembler.general.open(resource) ;

And if you are doing it by inter intercepting the call to create an 
daatset, the Resource argument is the root of the description.

See AssemblerUtils which is some helper code which may help, or may not.

Who does the LARQ assembler link to a dataset? Something in the ARQ 
assembler item or reference in the dataset assembler part?

When you have a reference (resource) for the object to build, call the 
"open" to get an Object and cast it.

	Andy

>
> This is what I am not sure is possible and if it is possible, I do not know how
> to do it with assemblers.
>
> A similar requirement would arise if we want to have an assembler which build a
> Dataset wrapping an existing assembler object built by another assembler from
> TDB or SDB.
>
> (I hope I explained my problem clearly enough... I have not tried yet to do this
> again, from scratch and avoiding any change to ARQ or TDB code as it should be
> to allow third party to plug-in their extensions transparently).
>
> Paolo
>
>>
>>      Andy
>

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Paolo Castagna <ca...@googlemail.com>.

Andy Seaborne wrote:
> On 25/05/12 19:27, Paolo Castagna wrote:
>> Hi Rob
>>
>> Robert Vesse wrote:
>>> There's nothing to stop us making LARQ generate an uber-jar in the same
>>> way we do for Fuseki with maven shade is there?
>>>
>>> This would be sufficient for users if they only had to drop in a single
>>> extra JAR into a directory - most users can manage that ;-)
>>
>> My 0 hasn't flipped to +1, but if you need that feel free to go ahead.
>>
>>> Yes that is my concern that we're making users jump through a lot of
>>> hoops
>>> to extend things
>>
>> Yep.
>>
>>> I thought assemblers can be used to get arbitrary classes loaded so can
>>> you not theoretically have some initializer class for the plugin
>>> (whether
>>> LARQ or otherwise) and add lines to the assembler to get it loaded.
>>>
>>> It seems like assemblers would solve other current problems with LARQ
>>> such
>>> as the ability to specify custom analyzers, index directories etc. 
>>> Doing
>>> everything via assembler seems the logical way to go for plugins like
>>> LARQ
>>
>> I am by no means an Assembler expert and perhaps you are right. I hope
>> so.
>>
>> If I remember correctly the problem I had was that in order to build a
>> LARQ
>> index, I needed a Dataset, but I did not find a way to get a reference
>> to an
>> object created by another assembler module from the LARQ ones. I hope I
>> explained this clearly enough.
>>
>> Paolo
> 
> +1 to assemblers.
> 
> And in general, the "drop jars into a dir" idea works very well with
> assemblers because
> 
> [] ja:loadClass "x.y.z.someclass" .
> 
> mechanism also invokes a static init() method so the code gets a chance
> to plug into the system during startup.
> 
> Paolo - there are various examples of assembler code in the codebase
> e.g. TDB :-)

Hi Andy,
good, thanks for the +1 on assemblers.

The thing that I struggled with is how to get a reference to a Dataset (built by
an assembler in TDB or other systems, SDB) when I need to build an IndexLARQ.

I need a reference to a Dataset since I want to register a listener to be
notified as new triples|quads are added/removed (and update Lucene index
accordingly).

  Dataset dataset = ... ;
  IndexBuilderModel larqBuilder = new IndexBuilderString(indexWriter) ;
  dataset.getDefaultModel().register(larqBuilder);
  for ( Iterator<String> iter = dataset.listNames() ; iter.hasNext() ; ) {
      String g = iter.next() ;
      dataset.getNamedModel(g).register(larqBuilder) ;
  }

This was the reasons of the 'pollution' and call to make(...) via reflection in
ARQ's DatasetAssembler.

Is there an example of an assembler which uses and/or get a reference to an
object build elsewhere via another assembler?

This is what I am not sure is possible and if it is possible, I do not know how
to do it with assemblers.

A similar requirement would arise if we want to have an assembler which build a
Dataset wrapping an existing assembler object built by another assembler from
TDB or SDB.

(I hope I explained my problem clearly enough... I have not tried yet to do this
again, from scratch and avoiding any change to ARQ or TDB code as it should be
to allow third party to plug-in their extensions transparently).

Paolo

> 
>     Andy

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Andy Seaborne <an...@apache.org>.

On 25/05/12 19:27, Paolo Castagna wrote:
> Hi Rob
>
> Robert Vesse wrote:
>> There's nothing to stop us making LARQ generate an uber-jar in the same
>> way we do for Fuseki with maven shade is there?
>>
>> This would be sufficient for users if they only had to drop in a single
>> extra JAR into a directory - most users can manage that ;-)
>
> My 0 hasn't flipped to +1, but if you need that feel free to go ahead.
>
>> Yes that is my concern that we're making users jump through a lot of hoops
>> to extend things
>
> Yep.
>
>> I thought assemblers can be used to get arbitrary classes loaded so can
>> you not theoretically have some initializer class for the plugin (whether
>> LARQ or otherwise) and add lines to the assembler to get it loaded.
>>
>> It seems like assemblers would solve other current problems with LARQ such
>> as the ability to specify custom analyzers, index directories etc.  Doing
>> everything via assembler seems the logical way to go for plugins like LARQ
>
> I am by no means an Assembler expert and perhaps you are right. I hope so.
>
> If I remember correctly the problem I had was that in order to build a LARQ
> index, I needed a Dataset, but I did not find a way to get a reference to an
> object created by another assembler module from the LARQ ones. I hope I
> explained this clearly enough.
>
> Paolo

+1 to assemblers.

And in general, the "drop jars into a dir" idea works very well with 
assemblers because

[] ja:loadClass "x.y.z.someclass" .

mechanism also invokes a static init() method so the code gets a chance 
to plug into the system during startup.

Paolo - there are various examples of assembler code in the codebase 
e.g. TDB :-)

	Andy

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Rob

Robert Vesse wrote:
> There's nothing to stop us making LARQ generate an uber-jar in the same
> way we do for Fuseki with maven shade is there?
> 
> This would be sufficient for users if they only had to drop in a single
> extra JAR into a directory - most users can manage that ;-)

My 0 hasn't flipped to +1, but if you need that feel free to go ahead.

> Yes that is my concern that we're making users jump through a lot of hoops
> to extend things

Yep.

> I thought assemblers can be used to get arbitrary classes loaded so can
> you not theoretically have some initializer class for the plugin (whether
> LARQ or otherwise) and add lines to the assembler to get it loaded.
> 
> It seems like assemblers would solve other current problems with LARQ such
> as the ability to specify custom analyzers, index directories etc.  Doing
> everything via assembler seems the logical way to go for plugins like LARQ

I am by no means an Assembler expert and perhaps you are right. I hope so.

If I remember correctly the problem I had was that in order to build a LARQ
index, I needed a Dataset, but I did not find a way to get a reference to an
object created by another assembler module from the LARQ ones. I hope I
explained this clearly enough.

Paolo

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Robert Vesse <rv...@yarcdata.com>.

On 5/25/12 10:25 AM, "Paolo Castagna" <ca...@googlemail.com>
wrote:

>Hi Rob
>
>Robert Vesse wrote:
>> In the context of the discussion of next releases maybe this is a good
>>time to revisit the topic of LARQ and its integration with Fuseki
>> 
>> As I understand it right now the way to deploy LARQ is to modify the
>>Fuseki POM to pull in the LARQ dependency so that one can built a
>>modified Fuseki uber jar with the LARQ and Lucene features enabled?
>>Paolo please correct me if I am mistaken
>
>Exactly. See also https://issues.apache.org/jira/browse/JENA-63 (which
>has an
>up-to-date patch attached).
>
>> In the interests of extensibility maybe it would be better to move to a
>>model that allow for easier drop in extension of Fuseki.  What I'm
>>thinking is that you have an empty lib directory in the Fuseki
>>distribution and then rather than having users call the JAR directly you
>>have a simple script that invokes the JAR and instantiates the class
>>path to be lib/* thereby picking up any drop in extensions users have
>
>It's a possibility, but you lose the advantage and simplicity of
>everything in a
>single JAR (and from what I can see, people (including me) like it... a
>lot!).
>
>Also, you need to make sure you put under ../lib all the necessary jars.
>For LARQ, for example, this includes Lucene as well as LARQ jar itself.
>Easy/trivial (but some manage to get this wrong and/or they mix and match
>wrong
>versions when they do things manually).

There's nothing to stop us making LARQ generate an uber-jar in the same
way we do for Fuseki with maven shade is there?

This would be sufficient for users if they only had to drop in a single
extra JAR into a directory - most users can manage that ;-)

>
>> It may even be sensible to move the Fuseki JAR into that lib directory
>>and just invoke java with a class path of lib/* and call out the main
>>class FusekiCmd explicitly and not use the jar argument at all e.g.
>> 
>> #!/bin/bash
>> 
>> Java cp lib/* org.apache.jena.fuseki.FusekiCmd $*
>> 
>> Forgive me if the above syntax is not exactly correct and you'd maybe
>>want to do some JAVA_HOME shenanigans in there but hopefully you get the
>>general idea.
>> 
>> Is this a sensible suggestion?
>
>I have no strong opinion about this.
>
>Personally, I prefer to have a single jar and use java -jar ... (but,
>it's just
>me and my preference). I can see the advantages of what you suggest. But,
>even
>if we do it, I would probably still 'patch' things and build my own
>single jar.
>
>> It seems like it would make it easier to drop in proposed future
>>extensions like GeoSPARQL without getting into users having to recompile
>>the code themselves.
>
>I don't disagree on the principle.
>
>I think allowing users to drop in extensions (plug-ins) to Fuseki is
>really an
>useful feature.
>
>Some use ServiceLoader (+/- a dependency injection framework), others
>write
>their own plug-in framework, others use a third party plug-in framework.
>Some
>use OSGi, etc. Jena has Assemblers, but I am not sure they can be
>stretched that
>far.

Yes that is my concern that we're making users jump through a lot of hoops
to extend things

I thought assemblers can be used to get arbitrary classes loaded so can
you not theoretically have some initializer class for the plugin (whether
LARQ or otherwise) and add lines to the assembler to get it loaded.

It seems like assemblers would solve other current problems with LARQ such
as the ability to specify custom analyzers, index directories etc.  Doing
everything via assembler seems the logical way to go for plugins like LARQ

>
>I do not have enough experience with Assemblers to know if it is actually
>possible to drop in a jar (which extends ARQ) in the classpath and have
>Fuseki
>picking it up transparently. When I tried to do that with LARQ, I failed
>and I
>needed to make small (and very ugly) changes in ARQ (which does not feel
>right).
>
>To summarize, I am not against or over exited by what you suggest.
>It makes sense, but I would probably not use it.
>In Apache lingo, I would say: 0.
>
>Paolo
>
>> 
>> Rob
>> 
>

Rob

Re: LARQ (and other Fuseki extensions) Integration Strategy

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Rob

Robert Vesse wrote:
> In the context of the discussion of next releases maybe this is a good time to revisit the topic of LARQ and its integration with Fuseki
> 
> As I understand it right now the way to deploy LARQ is to modify the Fuseki POM to pull in the LARQ dependency so that one can built a modified Fuseki uber jar with the LARQ and Lucene features enabled?  Paolo please correct me if I am mistaken

Exactly. See also https://issues.apache.org/jira/browse/JENA-63 (which has an
up-to-date patch attached).

> In the interests of extensibility maybe it would be better to move to a model that allow for easier drop in extension of Fuseki.  What I'm thinking is that you have an empty lib directory in the Fuseki distribution and then rather than having users call the JAR directly you have a simple script that invokes the JAR and instantiates the class path to be lib/* thereby picking up any drop in extensions users have

It's a possibility, but you lose the advantage and simplicity of everything in a
single JAR (and from what I can see, people (including me) like it... a lot!).

Also, you need to make sure you put under ../lib all the necessary jars.
For LARQ, for example, this includes Lucene as well as LARQ jar itself.
Easy/trivial (but some manage to get this wrong and/or they mix and match wrong
versions when they do things manually).

> It may even be sensible to move the Fuseki JAR into that lib directory and just invoke java with a class path of lib/* and call out the main class FusekiCmd explicitly and not use the –jar argument at all e.g.
> 
> #!/bin/bash
> 
> Java –cp lib/* org.apache.jena.fuseki.FusekiCmd $*
> 
> Forgive me if the above syntax is not exactly correct and you'd maybe want to do some JAVA_HOME shenanigans in there but hopefully you get the general idea.
> 
> Is this a sensible suggestion?

I have no strong opinion about this.

Personally, I prefer to have a single jar and use java -jar ... (but, it's just
me and my preference). I can see the advantages of what you suggest. But, even
if we do it, I would probably still 'patch' things and build my own single jar.

> It seems like it would make it easier to drop in proposed future extensions like GeoSPARQL without getting into users having to recompile the code themselves.

I don't disagree on the principle.

I think allowing users to drop in extensions (plug-ins) to Fuseki is really an
useful feature.

Some use ServiceLoader (+/- a dependency injection framework), others write
their own plug-in framework, others use a third party plug-in framework. Some
use OSGi, etc. Jena has Assemblers, but I am not sure they can be stretched that
far.

I do not have enough experience with Assemblers to know if it is actually
possible to drop in a jar (which extends ARQ) in the classpath and have Fuseki
picking it up transparently. When I tried to do that with LARQ, I failed and I
needed to make small (and very ugly) changes in ARQ (which does not feel right).

To summarize, I am not against or over exited by what you suggest.
It makes sense, but I would probably not use it.
In Apache lingo, I would say: 0.

Paolo

> 
> Rob
>