You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "A. Soroka" <aj...@virginia.edu> on 2016/07/01 14:46:39 UTC

Jena core vs. extensions?

In discussion on JENA-1015 (Commons RDF module) the question has arisen (as Andy put it there):

'How can we have a core set of modules (base to Fuseki maybe) and some other modules that range from "additional" to "experimental" while at the same time not getting into too much release overhead.'

I'll throw out some off-the-top-of-my-head ideas just to get conversation started:

In another project with which I have worked, the approach was to have tiers within the project itself (in that example, a "labs" tier, an "extensions" tier, and a "core" tier). Higher tiers committed the project itself (read: committers) to more intense support but also carried with them more qualifications. E.g. to get into the "extensions" tier it is necessary to have committed support from two institutions that are members of the larger organization associated with that project as a whole. Obviously, that doesn't work directly for an Apache project, but mutatis mutandis we could develop some similar scheme. Maybe Jena could support two classes of modules, core and not-core. Core would be as Andy describes above, not-core would include everything else. Just as a strawman, we could say that in order to support a not-core module, Jena might require the (voluntary) assignment of at least N committers to it who will maintain responsibility for it. (N=2 or 3, maybe?) Only those committers would normally make releases of that particular module, and they would have the responsibility at minimum to see to it that the latest release of their module works with the latest release of the core. This scheme would have the advantage of letting us partition the modules already in the project as well as making some space for possible future extensions, but it would require some organizational work and some overhead to get set up and run. (Although it might reduce the overhead for a core release, which is a nice thought.)

On the other hand, a real sharp and simple (and inexpensive) approach would be to say that Jena just doesn't maintain any non-core modules. If a non-core module is interesting enough to a community that intersects with that of Jena, it's on that community to find a home for it, either independently or via some other project (Apache or no). For example, I'd be curious as to whether the Commons RDF project could help host a Jena impl of Commons RDF. This scheme has the advantage of minimal burden on Jena.

In the field already we have https://labs.apache.org/, but that is focused on the efforts of individual committers and (IIRC) explicitly excludes making any releases.

Other flavors? Other thoughts?

---
A. Soroka
The University of Virginia Library

> On Jul 1, 2016, at 6:30 AM, Andy Seaborne (JIRA) <ji...@apache.org> wrote:
> 
> 
>    [ https://issues.apache.org/jira/browse/JENA-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357808#comment-15357808 ] 
> 
> Andy Seaborne edited comment on JENA-1015 at 7/1/16 10:30 AM:
> --------------------------------------------------------------
> 
> There is a useful bigger discussion here that would be better on dev@.
> 
> How can we have a core set of modules (base to Fuseki maybe) and some other modules that range from "additional" to "experimental" while at the same time not getting into too much release overhead.
> 
> 
> 
> was (Author: andy.seaborne):
> There is a useful bigger discussion here that would be better on dev@.
> 
> How can we have a core set of modules (base to Fuseki maybe) and some other modules that range from "additional" to "experimental" while at the same time getting into release overhead.
> 
> 
>> Commons RDF module
>> ------------------
>> 
>>                Key: JENA-1015
>>                URL: https://issues.apache.org/jira/browse/JENA-1015
>>            Project: Apache Jena
>>         Issue Type: New Feature
>>         Components: Jena
>>           Reporter: A. Soroka
>>           Priority: Minor
>> 
>> Based on a thread on the dev@ mailing list:
>> http://markmail.org/search/?q=jena+commons+rdf#query:jena%20commons%20rdf%20list%3Aorg.apache.incubator.jena-dev+page:1+mid:jjljtijtw36f3jf3+state:results
>> and mention on the users@ mailing list, there is some desire to implement the Commons RDF API:
>> https://commonsrdf.incubator.apache.org/
>> This issue is to track just such an effort.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)

Re: Commons RDF (was: Jena core vs. extensions?)

Posted by Stian Soiland-Reyes <st...@apache.org>.

Thank you! I'll adapt it into a new branch and update if needed.

On 2 Jul 2016 12:27 p.m., "Andy Seaborne" <an...@apache.org> wrote:

> On 02/07/16 01:23, Stian Soiland-Reyes wrote:
>
>> I think Andy's approach is great, I have extended it with parser
>> support inhttps://github.com/stain/commonsrdf-parser-jena  - but I've
>> not submitted this to Commons RDF yet;  as Andy's code was not
>> formally contributed to ASF I didn't want to do this on his behalf.  I
>> had considered to do a "clean room" approach - but of course my mind
>> would be tainted :)  Feel free to use my code though!
>>
> >
>
>> So a similar jena module/branch would very much be welcome at Commons
>> RDF - I would be very pleased if you would consider to contribute to
>> it! :)
>>
>
> The code is licensed with the Apache License.
>
> https://github.com/afs/commonsrdf-jena/blob/master/LICENSE
>
> The Apache License grants you the right to use and modify the code.
> Contribution is not a precondition!
>
> If you would prefer me to contribute: I've zipped it up and attached it to
> JENA-1015.
>
>
> https://issues.apache.org/jira/secure/attachment/12815904/2016-07-02_commonsrdf-jena.zip
>
>         Andy
>
>

Commons RDF (was: Jena core vs. extensions?)

Posted by Andy Seaborne <an...@apache.org>.

On 02/07/16 01:23, Stian Soiland-Reyes wrote:
> I think Andy's approach is great, I have extended it with parser
> support inhttps://github.com/stain/commonsrdf-parser-jena  - but I've
> not submitted this to Commons RDF yet;  as Andy's code was not
> formally contributed to ASF I didn't want to do this on his behalf.  I
> had considered to do a "clean room" approach - but of course my mind
> would be tainted :)  Feel free to use my code though!
 >
> So a similar jena module/branch would very much be welcome at Commons
> RDF - I would be very pleased if you would consider to contribute to
> it! :)

The code is licensed with the Apache License.

https://github.com/afs/commonsrdf-jena/blob/master/LICENSE

The Apache License grants you the right to use and modify the code. 
Contribution is not a precondition!

If you would prefer me to contribute: I've zipped it up and attached it 
to JENA-1015.

https://issues.apache.org/jira/secure/attachment/12815904/2016-07-02_commonsrdf-jena.zip

	Andy

Re: Jena core vs. extensions?

Posted by Stian Soiland-Reyes <st...@apache.org>.

On 1 July 2016 at 15:46, A. Soroka <aj...@virginia.edu> wrote:

> On the other hand, a real sharp and simple (and inexpensive) approach would be to say that Jena just doesn't maintain any non-core modules. If a non-core module is interesting enough to a community that intersects with that of Jena, it's on that community to find a home for it, either independently or via some other project (Apache or no). For example, I'd be curious as to whether the Commons RDF project could help host a Jena impl of Commons RDF. This scheme has the advantage of minimal burden on Jena.

Hi!

Glad you are picking up interest in Commons RDF and Jena :)

For Commons RDF now we have dropped the earlier goal of being a
"upper" interface for all the RDF frameworks. It could still very
well work like that, but the problem that held us back I think was the
stability issue - e.g. framework X didn't want to integrate with
Commons RDF before it was "stable" (as in 1.0 and graduated from
incubator?) - while Commons RDF didn't move forward without having any
integrations to test it with.

So now we have reconsidered our goals, and as our target is to
graduate to be a component of Apache Commons, then we are now looking
at the option of hosting integration modules ourselves as part of
Commons RDF, similar to how Commons VFS has a set of implementations.

Naturally these would then be wrappers rather than tighter integration
as if you could modify the native classes of the framework - but if
you do it carefully you can see it can be done without too much
"translation" or cloning overhead.

For instance the rdf4j module I would say is near complete now:

https://github.com/apache/incubator-commonsrdf/tree/rdf4j/rdf4j

(Note that this is based on the parser-with-quads branch which
includes new Dataset/Quad interfaces and the RDFParserBuilder
interface. Feedback very much welcome on both of these! See
https://github.com/apache/incubator-commonsrdf/pulls )

I think Andy's approach is great, I have extended it with parser
support in https://github.com/stain/commonsrdf-parser-jena - but I've
not submitted this to Commons RDF yet; as Andy's code was not
formally contributed to ASF I didn't want to do this on his behalf. I
had considered to do a "clean room" approach - but of course my mind
would be tainted :) Feel free to use my code though!

So a similar jena module/branch would very much be welcome at Commons
RDF - I would be very pleased if you would consider to contribute to
it! :) Perhaps there's baked in a lot of "know-how" in the rdf4j
approach that we should document - but it would be good to know your
take as well, and questions that come up.

There should not be a problem with the Apache Commons RDF hosting the
integrations, as (on graduating to Commons) any ASF committers can
update/fix them, and also a submitted pull request to update some
pom.xml etc. would be easily reviewed by any of the Commons
committers.

If we change our mind, then it's very easy to move the integration
over to jena-core with ASF - if rdf4j wanted to do the same I am
willing to contribute my commits on that separately to the Eclipse
foundation.

--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons
http://orcid.org/0000-0001-9842-9718

Re: Jena core vs. extensions?

Posted by Andy Seaborne <an...@apache.org>.

The thoughts behind my original comment were as much driven by getting 
the last release out as well as a potential new module.

Ideally, everything gets released synchronously with all versions in 
step. But in a big system, the ideal isn't always possible.

At the moment, the upgrade to Apache POM 18 has broken the jena maven 
tools (I have no idea why).  This is a dilemma - upgrade to the better 
process with one module excluded to stay back on the old version (2 
version back).  In this case, the upgrade was more to keep track of the 
parent but suppose that it was for a bug fix.

Some parts of the Jena release are used more widely than other parts. 
It seems prudent to accept that some releases may be for part of the set 
of modules.  And rather than discuss that at the release point, 
discussing what the community wants ahead of that busy time is useful.

This extends to new modules that might arise.  Applying the same 
criteria to a new piece of work (e.g. works in all cases) is IMO too 
high a barrier.  The criteria I'd like to use are more like as long as 
the code is legally clean and does something useful.

That said, for new work, the approach of separate github repo, not 
formally connected with the project seems the lowest work for the 
project.  I'm not a fan of the idea of Project Jena encompassing 
everything because the PMC can't as effective if it has to watch 
everything.  "Modules" with a different community is healthy.

This gets into a full circle - that new work might need tweaks to 
"jena-main" (for want of another label) which are quite small except 
that it enables the new work to progress. Luckily the new work can 
depend on snapshot builds for a while but maybe a release specifically 
for jena-main changes, or specific release of jena-main because of some 
bug, could come along.

	Andy

On 04/07/16 17:42, Stian Soiland-Reyes wrote:
> On 4 July 2016 at 17:23, A. Soroka <aj...@virginia.edu> wrote:
>
>
>> This seems a bit problematic to me. What about the cases (which are really the cases of interest) for which there is no horizon in time at which the work is to "catch up" in any particular sense, for which the extension modules have an independent future?
>
> That is indeed the danger.. that approach works well for what aims to
> be integrated additions. But that said, Fuseki 2 has managed to become
> integrated.
>
>
>
>> This seems intriguing-- are you suggesting that the new folks would be able to make releases out of the special "sidecar" code-base? What relationship would this proposal have with the formal incubation process? This is kind of asking for Jena to have multiple repositories under the Apache org, which seems reasonable to me, but then, I don't understand the implications for Infra and the work they would have to do for this.
>
> Yes, obviously practically the actual release would have to be done by
> an existing committer, but there is nothing stopping a PMC to own
> multiple git repositories, see for instance: http://git.apache.org/
>
> It's just a few clicks in Jira to ask Infra to make a new repository -
> with a few checks later that say the GitHub Pull Request integrations
> are working.
>
>
>
>> I think this makes sense for a lot of possible projects (larger ones, mostly). But the example in hand (a Commons RDF impl) is one where the size of project doesn't seem to merit a new Apache incubation, right?
>
> No, that's the kind of thing a PMC can just bootstrap on its own.
>
> BTW - I've now added
> https://github.com/apache/incubator-commonsrdf/tree/jena/jena so I
> guess that particular example is now moot :)
>
>
>> I like this, in many ways. It's low-cost, simple, and flexible. But I don't know enough about Apache methods to understand what the implications are for "ownership". Is it possible for a project (Jena) to "lay claim" to a GitHub org like that without some kind of formal arrangement with GitHub, and if not, is that formalization difficult or costly?
>
> Such a GitHub group would not be formally associated with ASF or Jena
> PMC (and that would have to be made clear) - just with particular
> individuals who happen to also be part of Jena.
>
> See the bottom of
> https://taverna.incubator.apache.org/download/code/#taverna-extras
> for how we did this - note that we used it more of a 'dumping ground'
> for license incompatible stuff - that kind of thing would earlier have
> been done at "Apache Extras" at Google Code
> https://code.google.com/a/apache-extras.org/hosting/ - but as Google
> Code now is in Read-only that is no longer an option.

Re: Jena core vs. extensions?

Posted by Stian Soiland-Reyes <st...@apache.org>.

On 4 July 2016 at 17:23, A. Soroka <aj...@virginia.edu> wrote:

> This seems a bit problematic to me. What about the cases (which are really the cases of interest) for which there is no horizon in time at which the work is to "catch up" in any particular sense, for which the extension modules have an independent future?

That is indeed the danger.. that approach works well for what aims to
be integrated additions. But that said, Fuseki 2 has managed to become
integrated.

> This seems intriguing-- are you suggesting that the new folks would be able to make releases out of the special "sidecar" code-base? What relationship would this proposal have with the formal incubation process? This is kind of asking for Jena to have multiple repositories under the Apache org, which seems reasonable to me, but then, I don't understand the implications for Infra and the work they would have to do for this.

Yes, obviously practically the actual release would have to be done by
an existing committer, but there is nothing stopping a PMC to own
multiple git repositories, see for instance: http://git.apache.org/

It's just a few clicks in Jira to ask Infra to make a new repository -
with a few checks later that say the GitHub Pull Request integrations
are working.

> I think this makes sense for a lot of possible projects (larger ones, mostly). But the example in hand (a Commons RDF impl) is one where the size of project doesn't seem to merit a new Apache incubation, right?

No, that's the kind of thing a PMC can just bootstrap on its own.

BTW - I've now added
https://github.com/apache/incubator-commonsrdf/tree/jena/jena so I
guess that particular example is now moot :)

> I like this, in many ways. It's low-cost, simple, and flexible. But I don't know enough about Apache methods to understand what the implications are for "ownership". Is it possible for a project (Jena) to "lay claim" to a GitHub org like that without some kind of formal arrangement with GitHub, and if not, is that formalization difficult or costly?

Such a GitHub group would not be formally associated with ASF or Jena
PMC (and that would have to be made clear) - just with particular
individuals who happen to also be part of Jena.

See the bottom of
https://taverna.incubator.apache.org/download/code/#taverna-extras
for how we did this - note that we used it more of a 'dumping ground'
for license incompatible stuff - that kind of thing would earlier have
been done at "Apache Extras" at Google Code
https://code.google.com/a/apache-extras.org/hosting/ - but as Google
Code now is in Read-only that is no longer an option.

--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons
http://orcid.org/0000-0001-9842-9718

Re: Jena core vs. extensions?

Posted by "A. Soroka" <aj...@virginia.edu>.

Comments in-line.

---
A. Soroka
The University of Virginia Library

> On Jul 1, 2016, at 8:58 PM, Stian Soiland-Reyes <st...@apache.org> wrote:
> 
> On 1 July 2016 at 15:46, A. Soroka <aj...@virginia.edu> wrote:
> 
>> might require the (voluntary) assignment of at least N committers to it who will maintain responsibility for it. (N=2 or 3, maybe?) Only those committers would normally make releases of that particular module, and they would have the responsibility at minimum to see to it that the latest release of their module works with the latest release of the core. 
> 
> In effect this is what we already have with some of the modules, e.g. elephas and Fuseki.   I wouldn't want to formalize such a grouping, as it would mean say the Fuseki grouping could be more scared from fixing core or vice versa.

Hm, I'm not sure whether it would discourage people from working on core or encourage new people (who might not be ready to work on the core) to work on the periphery. But in any case, this suggestion had the disadvantage of requiring more overhead, so it wasn't a great idea anyway. I'm not going to make any lengthy argument for it.

> For "new" and "experimental" things, why not try one of these:
> 
> a) "Anyone" can ask to have a feature branch for their new thing in https://github.com/apache/jena repository. They can contribute by GitHub pull request, but merging into "their" feature branch. 
...
> Another issue in this approach is releases - e.g. let's say the new feature was Fuseki and you want a release based on the stable core - but the rest of Jena core master is not feeling for a release right now, e.g. it's upgrading to Java 9.  So now you might have to do a slightly dirty "partial release" of just the new module, with modified poms to rely on the stable core.   This means you get mixed tags and tricky history - but it's OK if you then later "catch up" and the module agrees to keep the same dev/release speed as core.

This seems a bit problematic to me. What about the cases (which are really the cases of interest) for which there is no horizon in time at which the work is to "catch up" in any particular sense, for which the extension modules have an independent future?

> b) Someone proposes a new feature or module, perhaps fresh orhalf-baked code base, and Jena PMC asks Infra to make a new ASF repository, e.g. https://github.com/apache/jena-fuseki (let's sayFuseki was new today).  ... Here the questions is what happens if the N<3 volunteers of the new component all wither away - then basically that repository is "at risk"

This seems intriguing-- are you suggesting that the new folks would be able to make releases out of the special "sidecar" code-base? What relationship would this proposal have with the formal incubation process? This is kind of asking for Jena to have multiple repositories under the Apache org, which seems reasonable to me, but then, I don't understand the implications for Infra and the work they would have to do for this.

> c) The newcomer proposes an incubator project with Jena as the Sponsoring PMC. This has the danger of going towards a "Jakarta" model with many large projects that stop being relate to each other - but it's probably a better option if the new thing looks like it might become a separate product; the podling could then build its own community and have a choice to graduate to either a TLP (if Incubator PMC so agrees) or become formal part of Jena (if Jena PMC so agrees).

I think this makes sense for a lot of possible projects (larger ones, mostly). But the example in hand (a Commons RDF impl) is one where the size of project doesn't seem to merit a new Apache incubation, right?

> d) You set up a "jena-extras" GitHub organization with very low barriers for contributing. Basically anyone can start a git repository and ask to transfer it to jena-extras for shared custodians. Anyone who contributes are added to the organization, in addition to the ASF Jena committers (if they want to). The problem with this is that it's easy to end up with lax IP governance and you'll have a harder job later to move it into Jena (e.g. you would need to do a formal Software Grant) - however that can also be an advantage if the experiment wanted to try dependencies of incompatible licenses, e.g. 4store (GPL).  Community-wise there's a danger that this would form a "lower tier" and risk making it harder to become part of the Jena PMC, as any new and cool stuff could just be sent straight to jena-extras, where the newcomers don't get to learn the "Apache Way" and perhaps
> don't even communicate with each others moduels.

I like this, in many ways. It's low-cost, simple, and flexible. But I don't know enough about Apache methods to understand what the implications are for "ownership". Is it possible for a project (Jena) to "lay claim" to a GitHub org like that without some kind of formal arrangement with GitHub, and if not, is that formalization difficult or costly?

> -- 
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons
> http://orcid.org/0000-0001-9842-9718

Re: Jena core vs. extensions?

Posted by Stian Soiland-Reyes <st...@apache.org>.

On 1 July 2016 at 15:46, A. Soroka <aj...@virginia.edu> wrote:

> might require the (voluntary) assignment of at least N committers to it who will maintain responsibility for it. (N=2 or 3, maybe?) Only those committers would normally make releases of that particular module, and they would have the responsibility at minimum to see to it that the latest release of their module works with the latest release of the core. This scheme would have the advantage of letting us partition the modules already in the project as well as making some space for possible future extensions, but it would require some organizational work and some overhead to get set up and run. (Although it might reduce the overhead for a core release, which is a nice thought.)

In effect this is what we already have with some of the modules, e.g.
elephas and Fuseki. I wouldn't want to formalize such a grouping, as
it would mean say the Fuseki grouping could be more scared from fixing
core or vice versa.

For "new" and "experimental" things, why not try one of these:

a) "Anyone" can ask to have a feature branch for their new thing in
https://github.com/apache/jena repository. They can contribute by
GitHub pull request, but merging into "their" feature branch. As Pull
Requests are social, this might build interest in the new features,
and others might also contribute to the branch - which potentially
would stabilize and join the main "master" branch to become part of a
full Jena release. As usual, regular PRs and wider engagement puts
them on the track to committership - but if they only commit to their
special branch without doing the effort to merge into master they
might not qualify.

We used this in another project with GSOC students - one practical
problem with this is that there can be many branches to keep track of
(e.g. the pull request suddenly goes to master instead), and that the
branch can lag behind - e.g. your feature end up using a SNAPSHOT of
core. But branches are cheap, and if the experiment turns out to be a
flop it can just be tagged and then the branch can be deleted. (it can
be resurrected from tag if needed)

Another issue in this approach is releases - e.g. let's say the new
feature was Fuseki and you want a release based on the stable core -
but the rest of Jena core master is not feeling for a release right
now, e.g. it's upgrading to Java 9. So now you might have to do a
slightly dirty "partial release" of just the new module, with modified
poms to rely on the stable core. This means you get mixed tags and
tricky history - but it's OK if you then later "catch up" and the
module agrees to keep the same dev/release speed as core.

b) Someone proposes a new feature or module, perhaps fresh or
half-baked code base, and Jena PMC asks Infra to make a new ASF
repository, e.g. https://github.com/apache/jena-fuseki (let's say
Fuseki was new today). They contribute by pull requests - it's all
done on dev@jena as above, and again interest might build up.
Releases and dependencies against Jena Core can then be done cleanly -
at least as long as you don't need a SNAPSHOT feature in Core. This
would enable a 'faster' route to committership I think - although the
new committers would start with having experience mainly in "their"
repository, they might need to do things like releasing Jena Core if
they need a SNAPSHOT feature. Here the questions is what happens if
the N<3 volunteers of the new component all wither away - then
basically that repository is "at risk"

c) The newcomer proposes an incubator project with Jena as the
Sponsoring PMC. This has the danger of going towards a "Jakarta" model
with many large projects that stop being relate to each other - but
it's probably a better option if the new thing looks like it might
become a separate product; the podling could then build its own
community and have a choice to graduate to either a TLP (if Incubator
PMC so agrees) or become formal part of Jena (if Jena PMC so agrees).
The disadvantage of this is that the podling would not be as connected
to Jena to start with and the newcomers won't be on track for Jena
committership. (However they will have to learn the Apache Way through
the podling)

d) You set up a "jena-extras" GitHub organization with very low
barriers for contributing. Basically anyone can start a git repository
and ask to transfer it to jena-extras for shared custodians. Anyone
who contributes are added to the organization, in addition to the ASF
Jena committers (if they want to). The problem with this is that it's
easy to end up with lax IP governance and you'll have a harder job
later to move it into Jena (e.g. you would need to do a formal
Software Grant) - however that can also be an advantage if the
experiment wanted to try dependencies of incompatible licenses, e.g.
4store (GPL). Community-wise there's a danger that this would form a
"lower tier" and risk making it harder to become part of the Jena PMC,
as any new and cool stuff could just be sent straight to jena-extras,
where the newcomers don't get to learn the "Apache Way" and perhaps
don't even communicate with each others moduels.

Discuss! ;-)

--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons
http://orcid.org/0000-0001-9842-9718