You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2015/01/24 19:28:57 UTC

Release planning : 2.*

== Jena 2.*

Some first thoughts for the next release ...

What number should it be?  We may have have new components:

* OSGi bundle
* Adds Fuseki2 (as 2.0.0); Fuseki1 still there
* jena-elephas?
   (If this works for you, Rob - no strong advocacy either way)
* Anything else I've forgotten.

We don't have to do one release so if timing works better for new 
components, we can do 2.13.x , 2.14.x, to indicate new modules.

Personal opinion: If it gets too complicated or too lockstep,, then we 
should split to do a 2.13.0 real-soon-now and then do 2.14.0 fairly soon 
afterwards.  Numbers are not in limited supply.

2.12.1 was early Oct so we are still well within our target of 6 month 
releases.

== Jena3

Very soon after 2.12.2 or 2.12.3, kick off Jena3.

I'll start another thread for Jena3 process and use this thread for 2.x 
releases.

	Andy

Re: Release planning : 2.*

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Thanks Andy

That sounds like a sensible compromise to me

Rob

On 01/02/2015 11:29, "Andy Seaborne" <an...@apache.org> wrote:

>> On 26/01/15 19:11, Rob Vesse wrote:
>> Could people please review the following JIRA filter:
>>>
>>> http://s.apache.org/jena-no-fix-version
>>>
>>>
>>> This lists all resolved Jena issues that don't have a Fix Version set,
>>>I'm
>>> not expecting people to go back and retroactively change all ~200 or so
>>> issues but certainly those that have been fixed in the last couple of
>>> months could you please Reopen, update the Fix Version field
>>>appropriately
>>> and then Close again.
>>>
>>
>
>On 27/01/15 13:07, Andy Seaborne wrote:
>>
>> Good idea.
>>
>> We might be able to do with three bulk changes of the JIRA : open,
>> update, close if the process does not loose the grouping.  I may try it
>> (on a bulk update of one).
>>
>> If so, maybe the best way is to find a cut off point (2.12.0?) and set
>> that for all that closed before that release date (2nd August 2014)
>>
>> "Are you feeling lucky?"
>
>With great trepidation, I have made some bulk changes:
>
>For issues closed before 2.11.0 (2013/Sept/08), the fix version is set
>to 2.11.0.
>
>For issues closed after 2.11.0 and before 2.12.0 (2014/Aug/02), the fix
>version is 2.12.0.
>
>So for 2.11.0-2.12.0 it is mildly accurate, and before that is set to
>2.11.0.
>
>All changes made so "updated" is 01/Feb/15 so they can be found again
>(mixed in with a few others).
>
>There are now ~10 closed/unversioned.
>
>	Hope that's OK,
>	Andy
>

Re: Release planning : 2.*

Posted by Andy Seaborne <an...@apache.org>.

> On 26/01/15 19:11, Rob Vesse wrote:
> Could people please review the following JIRA filter:
>>
>> http://s.apache.org/jena-no-fix-version
>>
>>
>> This lists all resolved Jena issues that don't have a Fix Version set, I'm
>> not expecting people to go back and retroactively change all ~200 or so
>> issues but certainly those that have been fixed in the last couple of
>> months could you please Reopen, update the Fix Version field appropriately
>> and then Close again.
>>
>

On 27/01/15 13:07, Andy Seaborne wrote:
>
> Good idea.
>
> We might be able to do with three bulk changes of the JIRA : open,
> update, close if the process does not loose the grouping.  I may try it
> (on a bulk update of one).
>
> If so, maybe the best way is to find a cut off point (2.12.0?) and set
> that for all that closed before that release date (2nd August 2014)
>
> "Are you feeling lucky?"

With great trepidation, I have made some bulk changes:

For issues closed before 2.11.0 (2013/Sept/08), the fix version is set 
to 2.11.0.

For issues closed after 2.11.0 and before 2.12.0 (2014/Aug/02), the fix 
version is 2.12.0.

So for 2.11.0-2.12.0 it is mildly accurate, and before that is set to 
2.11.0.

All changes made so "updated" is 01/Feb/15 so they can be found again 
(mixed in with a few others).

There are now ~10 closed/unversioned.

	Hope that's OK,
	Andy

Re: Release planning : 2.*

Posted by Andy Seaborne <an...@apache.org>.

On 27/01/15 13:07, Andy Seaborne wrote:
>>> We don't have to do one release so if timing works better for new
>>> components, we can do 2.13.x , 2.14.x, to indicate new modules.
>>
>> 2.13.0 would be a good idea given the number of new components involved
>
> Agreed.

Jena main version updated to 2.13.0-SNAPSHOT

Also, the JIRA tag has also been updated so that all references to 
2.12.2 and now show 2.13.0.

	Andy

PS After an attempt with the release plugin, trying to do it 
automatically, I used:

find . -name pom.xml | \
   xargs -n 1 perl -i -p -e 's/2.12.2-SNAPSHOT/2.13.0-SNAPSHOT/'

after checking that was the only right 2.12.2-SNAPSHOT that would be 
changed.

This also works on project depending on Jena.
YMMV.  No warranty or guarantees.

Re: Release planning : 2.*

Posted by Andy Seaborne <an...@apache.org>.

On 25/02/15 15:52, Stian Soiland-Reyes wrote:
> Friday makes sense (and I guess you would let the vote run till
> Monday?), and I agree it could get risky to throw in too many fixes
> now.

That would be the minimum (72 hours) - typically, we/jena run votes from 
midnight the day it's done so 72+a bit.  UTC.

That is one week day though so maybe until Tueasday mignight.

> You are probably right in that more exciting Fuseki2 issues will come
> once it is out :)
>
>
> Shall I have a go to simply comment/hide those "bits that don't work"
> in the UI?  (e.g.  [Delete] and [Active]).  That's probably better
> than buttons that kind-of doesn't work - at least for a "first
> release", don't pretend to have features that aren't there.

Great - I'll try to catch on those ASAP.

	Andy

>
>
>
> On 25 February 2015 at 14:44, Andy Seaborne <an...@apache.org> wrote:
>> On 25/02/15 14:18, Stian Soiland-Reyes wrote:
>>>
>>> When will you start the vote..?
>>
>>
>> Friday if all goes well.  It's been too long as it is.
>>
>>> I had a quick look at some of the outstanding Fuseki 2 issues.
>>>
>>> Unfortunately the land of Javascript is still rather mysterious to me..
>>>
>>> Given that this will be Fuseki 2.0.0, I will have another go tonight
>>> to at least try to fix JENA-869 (DELETE that doesn't),
>>
>>
>> That one looks tricky.  Even if there is a small fix, it is a symptom of
>> something generally amiss so I think it'll manifest itself elsewhere even if
>> some test cases pass.
>>
>> I'd rather go for the "delete via UI not yet implemented" approach, get
>> changes after 2.13.0 so they can be then properly tested.  "Delete" is to be
>> treated very carefully where data is concerned!
>>
>>> JENA-867
>>> (Active button that doesn't deactivate),
>>
>>
>> Probably related to JENA-869. The lifecycle of datasets needs checking.
>>
>>> JENA-865 (Example query has broken prefixes)
>>
>>
>> That looks more likely to be a small localised issue.
>>
>> ---------------------
>>
>>  From my POV, "release early, release often" applies.
>>
>> I'd like to think that Fuseki 2 is a useful step forwards in the current
>> state.  It has some testing in development and seems as robust as Fuseki1
>> when used in the same way (the java server part).
>>
>> Fuseki1 is in the release as well.  That reduces risk.
>>
>> I'm fully expecting new requirements/expectations to come along as it gets
>> more used and that will set priorities and may overtake or significantly
>> modify the existing JIRA.  The real world has a tendency of throwing up the
>> unexpected (the "unknown unknowns").
>>
>> There comes a point when last minute fixes do indeed fix an observed problem
>> but can result in changes elsewhere in unexpected ways.
>>
>>          Andy
>>
>>
>>
>>>
>>> On 24 February 2015 at 18:27, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>> == Jena 2.*
>>>>>
>>>>> Some first thoughts for the next release ...
>>>>>
>>>>> What number should it be?  We may have have new components:
>>>>>
>>>>> * OSGi bundle
>>>>> * Adds Fuseki2 (as 2.0.0); Fuseki1 still there
>>>>> * jena-elephas?
>>>>>     (If this works for you, Rob - no strong advocacy either way)
>>>>> * Anything else I've forgotten.
>>>>
>>>>
>>>>
>>>> Final call for Jena 2.13.0.
>>>>
>>>> Everyone - please test development master.
>>>>
>>>>
>>>>
>>>> Once the OSGi matter is resolved, everything has settled down, and any
>>>> feedback acted on, I'm ready to be RM for the release.
>>>>
>>>> I have a clean copy locally and have it building - the machine has maven
>>>> 3.2.3 and a direct internet connection to central.
>>>>
>>>> Thanks to Rob's updated process instructions for using git - the "Git
>>>> Configuration Issue" warning box was needed.
>>>>
>>>>           Andy
>>>>
>>>> https://cwiki.apache.org/confluence/display/JENA/Release+Process
>>>
>>>
>>>
>>>
>>
>
>
>

Re: Release planning : 2.*

Posted by Stian Soiland-Reyes <st...@apache.org>.

Friday makes sense (and I guess you would let the vote run till
Monday?), and I agree it could get risky to throw in too many fixes
now.

You are probably right in that more exciting Fuseki2 issues will come
once it is out :)


Shall I have a go to simply comment/hide those "bits that don't work"
in the UI?  (e.g.  [Delete] and [Active]).  That's probably better
than buttons that kind-of doesn't work - at least for a "first
release", don't pretend to have features that aren't there.



On 25 February 2015 at 14:44, Andy Seaborne <an...@apache.org> wrote:
> On 25/02/15 14:18, Stian Soiland-Reyes wrote:
>>
>> When will you start the vote..?
>
>
> Friday if all goes well.  It's been too long as it is.
>
>> I had a quick look at some of the outstanding Fuseki 2 issues.
>>
>> Unfortunately the land of Javascript is still rather mysterious to me..
>>
>> Given that this will be Fuseki 2.0.0, I will have another go tonight
>> to at least try to fix JENA-869 (DELETE that doesn't),
>
>
> That one looks tricky.  Even if there is a small fix, it is a symptom of
> something generally amiss so I think it'll manifest itself elsewhere even if
> some test cases pass.
>
> I'd rather go for the "delete via UI not yet implemented" approach, get
> changes after 2.13.0 so they can be then properly tested.  "Delete" is to be
> treated very carefully where data is concerned!
>
>> JENA-867
>> (Active button that doesn't deactivate),
>
>
> Probably related to JENA-869. The lifecycle of datasets needs checking.
>
>> JENA-865 (Example query has broken prefixes)
>
>
> That looks more likely to be a small localised issue.
>
> ---------------------
>
> From my POV, "release early, release often" applies.
>
> I'd like to think that Fuseki 2 is a useful step forwards in the current
> state.  It has some testing in development and seems as robust as Fuseki1
> when used in the same way (the java server part).
>
> Fuseki1 is in the release as well.  That reduces risk.
>
> I'm fully expecting new requirements/expectations to come along as it gets
> more used and that will set priorities and may overtake or significantly
> modify the existing JIRA.  The real world has a tendency of throwing up the
> unexpected (the "unknown unknowns").
>
> There comes a point when last minute fixes do indeed fix an observed problem
> but can result in changes elsewhere in unexpected ways.
>
>         Andy
>
>
>
>>
>> On 24 February 2015 at 18:27, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> == Jena 2.*
>>>>
>>>> Some first thoughts for the next release ...
>>>>
>>>> What number should it be?  We may have have new components:
>>>>
>>>> * OSGi bundle
>>>> * Adds Fuseki2 (as 2.0.0); Fuseki1 still there
>>>> * jena-elephas?
>>>>    (If this works for you, Rob - no strong advocacy either way)
>>>> * Anything else I've forgotten.
>>>
>>>
>>>
>>> Final call for Jena 2.13.0.
>>>
>>> Everyone - please test development master.
>>>
>>>
>>>
>>> Once the OSGi matter is resolved, everything has settled down, and any
>>> feedback acted on, I'm ready to be RM for the release.
>>>
>>> I have a clean copy locally and have it building - the machine has maven
>>> 3.2.3 and a direct internet connection to central.
>>>
>>> Thanks to Rob's updated process instructions for using git - the "Git
>>> Configuration Issue" warning box was needed.
>>>
>>>          Andy
>>>
>>> https://cwiki.apache.org/confluence/display/JENA/Release+Process
>>
>>
>>
>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.*

Posted by Andy Seaborne <an...@apache.org>.

On 25/02/15 14:18, Stian Soiland-Reyes wrote:
> When will you start the vote..?

Friday if all goes well.  It's been too long as it is.

> I had a quick look at some of the outstanding Fuseki 2 issues.
>
> Unfortunately the land of Javascript is still rather mysterious to me..
>
> Given that this will be Fuseki 2.0.0, I will have another go tonight
> to at least try to fix JENA-869 (DELETE that doesn't),

That one looks tricky.  Even if there is a small fix, it is a symptom of 
something generally amiss so I think it'll manifest itself elsewhere 
even if some test cases pass.

I'd rather go for the "delete via UI not yet implemented" approach, get 
changes after 2.13.0 so they can be then properly tested.  "Delete" is 
to be treated very carefully where data is concerned!

> JENA-867
> (Active button that doesn't deactivate),

Probably related to JENA-869. The lifecycle of datasets needs checking.

> JENA-865 (Example query has broken prefixes)

That looks more likely to be a small localised issue.

---------------------

 From my POV, "release early, release often" applies.

I'd like to think that Fuseki 2 is a useful step forwards in the current 
state.  It has some testing in development and seems as robust as 
Fuseki1 when used in the same way (the java server part).

Fuseki1 is in the release as well.  That reduces risk.

I'm fully expecting new requirements/expectations to come along as it 
gets more used and that will set priorities and may overtake or 
significantly modify the existing JIRA.  The real world has a tendency 
of throwing up the unexpected (the "unknown unknowns").

There comes a point when last minute fixes do indeed fix an observed 
problem but can result in changes elsewhere in unexpected ways.

	Andy

>
> On 24 February 2015 at 18:27, Andy Seaborne <an...@apache.org> wrote:
>>> == Jena 2.*
>>>
>>> Some first thoughts for the next release ...
>>>
>>> What number should it be?  We may have have new components:
>>>
>>> * OSGi bundle
>>> * Adds Fuseki2 (as 2.0.0); Fuseki1 still there
>>> * jena-elephas?
>>>    (If this works for you, Rob - no strong advocacy either way)
>>> * Anything else I've forgotten.
>>
>>
>> Final call for Jena 2.13.0.
>>
>> Everyone - please test development master.
>>
>>
>>
>> Once the OSGi matter is resolved, everything has settled down, and any
>> feedback acted on, I'm ready to be RM for the release.
>>
>> I have a clean copy locally and have it building - the machine has maven
>> 3.2.3 and a direct internet connection to central.
>>
>> Thanks to Rob's updated process instructions for using git - the "Git
>> Configuration Issue" warning box was needed.
>>
>>          Andy
>>
>> https://cwiki.apache.org/confluence/display/JENA/Release+Process
>
>
>

Re: Release planning : 2.*

Posted by Stian Soiland-Reyes <st...@apache.org>.

When will you start the vote..?

I had a quick look at some of the outstanding Fuseki 2 issues.

Unfortunately the land of Javascript is still rather mysterious to me..

Given that this will be Fuseki 2.0.0, I will have another go tonight
to at least try to fix JENA-869 (DELETE that doesn't), JENA-867
(Active button that doesn't deactivate), JENA-865 (Example query has
broken prefixes)

On 24 February 2015 at 18:27, Andy Seaborne <an...@apache.org> wrote:
>> == Jena 2.*
>>
>> Some first thoughts for the next release ...
>>
>> What number should it be?  We may have have new components:
>>
>> * OSGi bundle
>> * Adds Fuseki2 (as 2.0.0); Fuseki1 still there
>> * jena-elephas?
>>   (If this works for you, Rob - no strong advocacy either way)
>> * Anything else I've forgotten.
>
>
> Final call for Jena 2.13.0.
>
> Everyone - please test development master.
>
>
>
> Once the OSGi matter is resolved, everything has settled down, and any
> feedback acted on, I'm ready to be RM for the release.
>
> I have a clean copy locally and have it building - the machine has maven
> 3.2.3 and a direct internet connection to central.
>
> Thanks to Rob's updated process instructions for using git - the "Git
> Configuration Issue" warning box was needed.
>
>         Andy
>
> https://cwiki.apache.org/confluence/display/JENA/Release+Process



-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.13.0

Posted by Stian Soiland-Reyes <st...@apache.org>.

ah but hang on.. in fuseki 1 there's just "jena-fuseki". In Fuseki 2
"jena-fuseki" is just the parent pom, the "actual" artifacts are
jena-fuseki-core, etc.. so the parent can be called anything - in many
ways "jena-fuseki2" is even more correct as it matches its folder
name.

So a full +1 to that workaround.

On 2 March 2015 at 16:46, Andy Seaborne <an...@apache.org> wrote:
> On 02/03/15 16:07, Stian Soiland-Reyes wrote:
>>
>> That's the style used by Apache Commons (+ Java package rename) - but
>> there it is not for build reasons but for co-existence as a Maven
>> dependency.
>
>
> Co-existence is NOT going to work for all sorts of reasons ...
>
>
>
>>
>> In Jena, both fuseki versions use the package name
>> org.apache.jena.fuseki and so can't co-exist on the classpath.
>
>
> And the dependencies like Jetty ... something Apache Commons does not have
> to deal with.
>
>> Done "proper" this should probably just be two branches in a separate
>> jena-fuseki repository.. but as you always would want to release both
>> Fusekis whenever there's a new Jena (but not necessarily a new Jena
>> when there's a new Fuseki) I understand why you have put them all
>> together.
>>
>> Would it make sense for fuseki to be used as a Maven dependency? If it
>> is, then I would stick with whatever artifactId it ends up with, so if
>> becomes jena-fuseki2 now, then that's it.
>
>
> v1 - not really (see other email).
> v2 - yes
>
>> If not - perhaps just take jena-fuseki2 out of the master pom.xml and
>> build+tag it separately (longer release process, easier to forget).
>> Release plugin should ask you how to settle the SNAPSHOT dependencies
>> on Jena. When time comes, you can swap around so that jena-fuseki2 is
>> in the master pom.xml instead of fuseki 1.
>>
>> This (which is almost like splitting to a separate git repo) might
>> make sense if there will be further work on Fuseki 2 that is
>> independent from Jena (specially as we move to Jena 3?) -- it does
>> need some more to be complete on the UI side, which seems to be what
>> people fall in love with.
>
>
> As the v1 UI is basic, I think "need" is a bit out of place.
>
> v2 UI should be at least as good as v1.
>
>
>>
>>
>>
>> On 2 March 2015 at 15:37, Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>
>>> I think it is a more general limitation of Maven
>>>
>>> Probably easiest thing is to call it jena-fuseki2 for the time being and
>>> then at such time as 2.x is sufficiently stable to replace 1.x we can
>>> rename again
>>>
>>> Rob
>>>
>>> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>>>
>>>> There'll be a bit of a delay in building Jena 2.13.0.
>>>>
>>>> In our setup, the release process can't not handle having multiple
>>>> versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>>>> reactor has duplicates and maven stops with an error.
>>>>
>>>> Looks like it is the release plugin.
>>>>
>>>> Whether this is because we are using an old(ish) apache parent or
>>>> whether it's an on-going problem, isn't clear yet.  Trying things out is
>>>> a slow process. Maybe a change of artifact name is needed, which itself
>>>> then needs checking in case that cascades in any way.  Or two build
>>>> cycles.
>>>>
>>>> Works:
>>>>    mvn -s settings.xml release:prepare -DdryRun=true
>>>> then fails
>>>>    mvn -s settings.xml release:prepare
>>>>
>>>> Bother.
>>>>
>>>>        Andy
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.13.0

Posted by Andy Seaborne <an...@apache.org>.

On 02/03/15 16:07, Stian Soiland-Reyes wrote:
> That's the style used by Apache Commons (+ Java package rename) - but
> there it is not for build reasons but for co-existence as a Maven
> dependency.

Co-existence is NOT going to work for all sorts of reasons ...



>
> In Jena, both fuseki versions use the package name
> org.apache.jena.fuseki and so can't co-exist on the classpath.

And the dependencies like Jetty ... something Apache Commons does not 
have to deal with.

> Done "proper" this should probably just be two branches in a separate
> jena-fuseki repository.. but as you always would want to release both
> Fusekis whenever there's a new Jena (but not necessarily a new Jena
> when there's a new Fuseki) I understand why you have put them all
> together.
>
> Would it make sense for fuseki to be used as a Maven dependency? If it
> is, then I would stick with whatever artifactId it ends up with, so if
> becomes jena-fuseki2 now, then that's it.

v1 - not really (see other email).
v2 - yes

> If not - perhaps just take jena-fuseki2 out of the master pom.xml and
> build+tag it separately (longer release process, easier to forget).
> Release plugin should ask you how to settle the SNAPSHOT dependencies
> on Jena. When time comes, you can swap around so that jena-fuseki2 is
> in the master pom.xml instead of fuseki 1.
>
> This (which is almost like splitting to a separate git repo) might
> make sense if there will be further work on Fuseki 2 that is
> independent from Jena (specially as we move to Jena 3?) -- it does
> need some more to be complete on the UI side, which seems to be what
> people fall in love with.

As the v1 UI is basic, I think "need" is a bit out of place.

v2 UI should be at least as good as v1.

>
>
>
> On 2 March 2015 at 15:37, Rob Vesse <rv...@dotnetrdf.org> wrote:
>> I think it is a more general limitation of Maven
>>
>> Probably easiest thing is to call it jena-fuseki2 for the time being and
>> then at such time as 2.x is sufficiently stable to replace 1.x we can
>> rename again
>>
>> Rob
>>
>> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>> There'll be a bit of a delay in building Jena 2.13.0.
>>>
>>> In our setup, the release process can't not handle having multiple
>>> versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>>> reactor has duplicates and maven stops with an error.
>>>
>>> Looks like it is the release plugin.
>>>
>>> Whether this is because we are using an old(ish) apache parent or
>>> whether it's an on-going problem, isn't clear yet.  Trying things out is
>>> a slow process. Maybe a change of artifact name is needed, which itself
>>> then needs checking in case that cascades in any way.  Or two build
>>> cycles.
>>>
>>> Works:
>>>    mvn -s settings.xml release:prepare -DdryRun=true
>>> then fails
>>>    mvn -s settings.xml release:prepare
>>>
>>> Bother.
>>>
>>>        Andy
>>>
>>
>>
>>
>>
>
>
>

Re: Release planning : 2.13.0

Posted by Stian Soiland-Reyes <st...@apache.org>.

That's the style used by Apache Commons (+ Java package rename) - but
there it is not for build reasons but for co-existence as a Maven
dependency.

In Jena, both fuseki versions use the package name
org.apache.jena.fuseki and so can't co-exist on the classpath.

Done "proper" this should probably just be two branches in a separate
jena-fuseki repository.. but as you always would want to release both
Fusekis whenever there's a new Jena (but not necessarily a new Jena
when there's a new Fuseki) I understand why you have put them all
together.

Would it make sense for fuseki to be used as a Maven dependency? If it
is, then I would stick with whatever artifactId it ends up with, so if
becomes jena-fuseki2 now, then that's it.

If not - perhaps just take jena-fuseki2 out of the master pom.xml and
build+tag it separately (longer release process, easier to forget).
Release plugin should ask you how to settle the SNAPSHOT dependencies
on Jena. When time comes, you can swap around so that jena-fuseki2 is
in the master pom.xml instead of fuseki 1.

This (which is almost like splitting to a separate git repo) might
make sense if there will be further work on Fuseki 2 that is
independent from Jena (specially as we move to Jena 3?) -- it does
need some more to be complete on the UI side, which seems to be what
people fall in love with.

On 2 March 2015 at 15:37, Rob Vesse <rv...@dotnetrdf.org> wrote:
> I think it is a more general limitation of Maven
>
> Probably easiest thing is to call it jena-fuseki2 for the time being and
> then at such time as 2.x is sufficiently stable to replace 1.x we can
> rename again
>
> Rob
>
> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>
>>There'll be a bit of a delay in building Jena 2.13.0.
>>
>>In our setup, the release process can't not handle having multiple
>>versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>>reactor has duplicates and maven stops with an error.
>>
>>Looks like it is the release plugin.
>>
>>Whether this is because we are using an old(ish) apache parent or
>>whether it's an on-going problem, isn't clear yet.  Trying things out is
>>a slow process. Maybe a change of artifact name is needed, which itself
>>then needs checking in case that cascades in any way.  Or two build
>>cycles.
>>
>>Works:
>>   mvn -s settings.xml release:prepare -DdryRun=true
>>then fails
>>   mvn -s settings.xml release:prepare
>>
>>Bother.
>>
>>       Andy
>>
>
>
>
>

-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.13.0

Posted by Andy Seaborne <an...@apache.org>.

On 03/03/15 14:01, Stian Soiland-Reyes wrote:
> On 3 March 2015 at 10:41, Andy Seaborne <an...@apache.org> wrote:
>
>> Fuseki1 is in dist/binaries/ and mirrored.
>>
>> In my experience, short-term mitigation worries can leave long-term legacy
>> behind and it costs more in support.  If we can find a one-time change, I
>> think it is less work for the project.  Users have to change sometime -
>> let's make it once.
>
> I see - any fuseki1-users using those as Maven dependencies (e.g. from
> an assembly file) will probably want to migrate to the more proper
> artifacts and newer Jetty from Fuseki 2 anyway.
>
>
> Unless anyone shows up showing the opposite, I now agree with you to
> go for the v1 option with "jena-fuseki1".

Artifact jena-fuseki renamed jena-fuseki1 - I went and cleaned up 
snapshots and there should be reset builds as well.

Hopefully, dist.sh is up-to-date as well but I need to test that out. 
The downloads are set to be "jena-fuseki1-VER1-distribution.zip" and 
"apache-jena-fuseki-VER2.zip".

There's a case for renaming jena-fuseki as well but the artifacts get 
name-flattened to jena-fuseki(-*). ATM the folder structure at least has 
jena-fuseki2 then jena-fuseki2/jena-fuseki*

	Andy

Re: Release planning : 2.13.0

Posted by Stian Soiland-Reyes <st...@apache.org>.

On 3 March 2015 at 10:41, Andy Seaborne <an...@apache.org> wrote:

> Fuseki1 is in dist/binaries/ and mirrored.
>
> In my experience, short-term mitigation worries can leave long-term legacy
> behind and it costs more in support.  If we can find a one-time change, I
> think it is less work for the project.  Users have to change sometime -
> let's make it once.

I see - any fuseki1-users using those as Maven dependencies (e.g. from
an assembly file) will probably want to migrate to the more proper
artifacts and newer Jetty from Fuseki 2 anyway.

Unless anyone shows up showing the opposite, I now agree with you to
go for the v1 option with "jena-fuseki1".

-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.13.0

Posted by Andy Seaborne <an...@apache.org>.

On 03/03/15 01:38, Stian Soiland-Reyes wrote:
> v1 artifacts:
>
> http://central.maven.org/maven2/org/apache/jena/jena-fuseki/1.1.1/
>
> Almost any of those could in theory be used..
>
> If somehow a Maven
> project before got jena-fuseki 1.1 the JAR and now upgrade to 2.0 the
> POM, that's not going to work very well :) (At least it should fail
> early!)

The technical part is clear -  it's the "could", "in theory" and 
"somehow" that matter.

Fuseki1 is in dist/binaries/ and mirrored.

In my experience, short-term mitigation worries can leave long-term 
legacy behind and it costs more in support.  If we can find a one-time 
change, I think it is less work for the project.  Users have to change 
sometime - let's make it once.

> The reason I backed Rob's v2 approach is that it's the least
> intrusive, and can be transient until fuseki1 is deprecated. It's
> slightly messier for the project "Why is the parent jena-fuseki2 and
> the module just jena-fuseki-*??" - but less impact for anyone else who
> should never need to care about the parent.
>
> You could also do a v2 variant where the parent is called
> jena-fuseki-parent instead of jena-fuseki / jena-fuseki2? Less
> confusion?
>
>
> If we think there are no-one using the fuseki 1 artifacts from
> Maven++, then v1 approach is probably still workable, even if it's
> more intrusive.

As a server Fuseki2 is more compliant, and probably more stable than 
Fuseki1 because it uses Jetty9.

There is a issue with Jetty8, as Fuseki1 uses it, under high load. 
Jetty9 is radically different in the area of connectors and does not go 
into weird states (it's quite likely not Jetty directly, but the way 
Java 1.6 works - Jetty9 uses java7 features).

	Andy
>
> On 2 March 2015 at 16:40, Andy Seaborne <an...@apache.org> wrote:
>> On 02/03/15 15:37, Rob Vesse wrote:
>>>
>>> I think it is a more general limitation of Maven
>>>
>>> Probably easiest thing is to call it jena-fuseki2 for the time being and
>>> then at such time as 2.x is sufficiently stable to replace 1.x we can
>>> rename again
>>
>>
>> It's just the release plugin at the moment. What i think is happening is
>> that when it rewrites the version ids after asking what to set them to, it
>> overwrites regardless.  The dialog gets the right answer; it's the rewrite
>> stage that does not map current->release ids, just overwrites with release
>> ids.
>>
>> There may other lurking issues as well.  It's supposed to work in a
>> sufficiently recent maven.  That said, while it has been working in
>> development builds, relying on the version might be too clever.
>>
>>> jena-fuseki2
>>
>> All the choices:
>>
>> 1/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki-war, ...
>>     Just the clashing artifact renamed.
>>
>> 2/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki2-war, ...
>>     Rename all V2, leave v1
>>
>> 3/ (V1) jena-fuseki1, (V2) jena-fuseki, jena-fuseki-war, ...
>>     Rename v1 only.
>>
>> 4/ (V1) jena-fuseki1, (V2) jena-fuseki2, jena-fuseki2-war, ...
>>     All modules have the major version.
>>
>> I'd like to do it by making Fuseki v1 "jena-fuseki1", leave "jena-fuseki"
>> for Fuseki2 - option 3.  4's OK; 1 seems like trouble. 2 isn't clear to my
>> way of thinking.  "jena-fuseki" is a
>>
>> Fuseki2 does have artifacts (WAR file; the server jar is done as an
>> artifact, not a classifier addition; an embedded version sometime). That
>> makes a second rename of Fuseki v2 artifact(s) less desirable.
>>
>> This isn't a strongly held position.  The underlying assumption is that
>> Fuseki v1 is not used as an artifact -- only as a distribution.
>>
>> Of course, I can't be sure that is no one outside the build uses it as an
>> artifact so if anyone thinks it's a bad idea, do say so.
>>
>> There is an internal artifact use of Fuseki v1 by jena-jdbc-driver-remote
>> and jena-jdbc-driver-bundle (2 each).  That causes a different, minor
>> problem, which I didn't understand.  When resolving dependencies in
>> release:prepare, that useage causes a "you have SNAPSHOTs do you want to fix
>> them" dialog from the release plugin (inc dry run).  You get 2 lots of two
>> requests to fix the SNAPSHOT version.  In case that was the cause of the
>> major issue, I set it to a fixed 1.1.1 but it didn't change anything other
>> than removing the additional dialog.
>>
>>          Andy
>>
>>
>>>
>>> Rob
>>>
>>> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>>>
>>>> There'll be a bit of a delay in building Jena 2.13.0.
>>>>
>>>> In our setup, the release process can't not handle having multiple
>>>> versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>>>> reactor has duplicates and maven stops with an error.
>>>>
>>>> Looks like it is the release plugin.
>>>>
>>>> Whether this is because we are using an old(ish) apache parent or
>>>> whether it's an on-going problem, isn't clear yet.  Trying things out is
>>>> a slow process. Maybe a change of artifact name is needed, which itself
>>>> then needs checking in case that cascades in any way.  Or two build
>>>> cycles.
>>>>
>>>> Works:
>>>>     mvn -s settings.xml release:prepare -DdryRun=true
>>>> then fails
>>>>     mvn -s settings.xml release:prepare
>>>>
>>>> Bother.
>>>>
>>>>          Andy
>>>>
>>>
>>>
>>>
>>>
>>
>
>
>

Re: Release planning : 2.13.0

Posted by Stian Soiland-Reyes <st...@apache.org>.

v1 artifacts:

http://central.maven.org/maven2/org/apache/jena/jena-fuseki/1.1.1/

Almost any of those could in theory be used..  If somehow a Maven
project before got jena-fuseki 1.1 the JAR and now upgrade to 2.0 the
POM, that's not going to work very well :) (At least it should fail
early!)


The reason I backed Rob's v2 approach is that it's the least
intrusive, and can be transient until fuseki1 is deprecated. It's
slightly messier for the project "Why is the parent jena-fuseki2 and
the module just jena-fuseki-*??" - but less impact for anyone else who
should never need to care about the parent.

You could also do a v2 variant where the parent is called
jena-fuseki-parent instead of jena-fuseki / jena-fuseki2? Less
confusion?


If we think there are no-one using the fuseki 1 artifacts from
Maven++, then v1 approach is probably still workable, even if it's
more intrusive.



On 2 March 2015 at 16:40, Andy Seaborne <an...@apache.org> wrote:
> On 02/03/15 15:37, Rob Vesse wrote:
>>
>> I think it is a more general limitation of Maven
>>
>> Probably easiest thing is to call it jena-fuseki2 for the time being and
>> then at such time as 2.x is sufficiently stable to replace 1.x we can
>> rename again
>
>
> It's just the release plugin at the moment. What i think is happening is
> that when it rewrites the version ids after asking what to set them to, it
> overwrites regardless.  The dialog gets the right answer; it's the rewrite
> stage that does not map current->release ids, just overwrites with release
> ids.
>
> There may other lurking issues as well.  It's supposed to work in a
> sufficiently recent maven.  That said, while it has been working in
> development builds, relying on the version might be too clever.
>
>> jena-fuseki2
>
> All the choices:
>
> 1/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki-war, ...
>    Just the clashing artifact renamed.
>
> 2/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki2-war, ...
>    Rename all V2, leave v1
>
> 3/ (V1) jena-fuseki1, (V2) jena-fuseki, jena-fuseki-war, ...
>    Rename v1 only.
>
> 4/ (V1) jena-fuseki1, (V2) jena-fuseki2, jena-fuseki2-war, ...
>    All modules have the major version.
>
> I'd like to do it by making Fuseki v1 "jena-fuseki1", leave "jena-fuseki"
> for Fuseki2 - option 3.  4's OK; 1 seems like trouble. 2 isn't clear to my
> way of thinking.  "jena-fuseki" is a
>
> Fuseki2 does have artifacts (WAR file; the server jar is done as an
> artifact, not a classifier addition; an embedded version sometime). That
> makes a second rename of Fuseki v2 artifact(s) less desirable.
>
> This isn't a strongly held position.  The underlying assumption is that
> Fuseki v1 is not used as an artifact -- only as a distribution.
>
> Of course, I can't be sure that is no one outside the build uses it as an
> artifact so if anyone thinks it's a bad idea, do say so.
>
> There is an internal artifact use of Fuseki v1 by jena-jdbc-driver-remote
> and jena-jdbc-driver-bundle (2 each).  That causes a different, minor
> problem, which I didn't understand.  When resolving dependencies in
> release:prepare, that useage causes a "you have SNAPSHOTs do you want to fix
> them" dialog from the release plugin (inc dry run).  You get 2 lots of two
> requests to fix the SNAPSHOT version.  In case that was the cause of the
> major issue, I set it to a fixed 1.1.1 but it didn't change anything other
> than removing the additional dialog.
>
>         Andy
>
>
>>
>> Rob
>>
>> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>> There'll be a bit of a delay in building Jena 2.13.0.
>>>
>>> In our setup, the release process can't not handle having multiple
>>> versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>>> reactor has duplicates and maven stops with an error.
>>>
>>> Looks like it is the release plugin.
>>>
>>> Whether this is because we are using an old(ish) apache parent or
>>> whether it's an on-going problem, isn't clear yet.  Trying things out is
>>> a slow process. Maybe a change of artifact name is needed, which itself
>>> then needs checking in case that cascades in any way.  Or two build
>>> cycles.
>>>
>>> Works:
>>>    mvn -s settings.xml release:prepare -DdryRun=true
>>> then fails
>>>    mvn -s settings.xml release:prepare
>>>
>>> Bother.
>>>
>>>         Andy
>>>
>>
>>
>>
>>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating)
http://orcid.org/0000-0001-9842-9718

Re: Release planning : 2.13.0

Posted by Andy Seaborne <an...@apache.org>.

On 02/03/15 15:37, Rob Vesse wrote:
> I think it is a more general limitation of Maven
>
> Probably easiest thing is to call it jena-fuseki2 for the time being and
> then at such time as 2.x is sufficiently stable to replace 1.x we can
> rename again

It's just the release plugin at the moment. What i think is happening is 
that when it rewrites the version ids after asking what to set them to, 
it overwrites regardless.  The dialog gets the right answer; it's the 
rewrite stage that does not map current->release ids, just overwrites 
with release ids.

There may other lurking issues as well.  It's supposed to work in a 
sufficiently recent maven.  That said, while it has been working in 
development builds, relying on the version might be too clever.

 > jena-fuseki2

All the choices:

1/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki-war, ...
    Just the clashing artifact renamed.

2/ (V1) jena-fuseki, (V2) jena-fuseki2, jena-fuseki2-war, ...
    Rename all V2, leave v1

3/ (V1) jena-fuseki1, (V2) jena-fuseki, jena-fuseki-war, ...
    Rename v1 only.

4/ (V1) jena-fuseki1, (V2) jena-fuseki2, jena-fuseki2-war, ...
    All modules have the major version.

I'd like to do it by making Fuseki v1 "jena-fuseki1", leave 
"jena-fuseki" for Fuseki2 - option 3.  4's OK; 1 seems like trouble. 2 
isn't clear to my way of thinking.  "jena-fuseki" is a

Fuseki2 does have artifacts (WAR file; the server jar is done as an 
artifact, not a classifier addition; an embedded version sometime). 
That makes a second rename of Fuseki v2 artifact(s) less desirable.

This isn't a strongly held position.  The underlying assumption is that 
Fuseki v1 is not used as an artifact -- only as a distribution.

Of course, I can't be sure that is no one outside the build uses it as 
an artifact so if anyone thinks it's a bad idea, do say so.

There is an internal artifact use of Fuseki v1 by 
jena-jdbc-driver-remote and jena-jdbc-driver-bundle (2 each).  That 
causes a different, minor problem, which I didn't understand.  When 
resolving dependencies in release:prepare, that useage causes a "you 
have SNAPSHOTs do you want to fix them" dialog from the release plugin 
(inc dry run).  You get 2 lots of two requests to fix the SNAPSHOT 
version.  In case that was the cause of the major issue, I set it to a 
fixed 1.1.1 but it didn't change anything other than removing the 
additional dialog.

	Andy

>
> Rob
>
> On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:
>
>> There'll be a bit of a delay in building Jena 2.13.0.
>>
>> In our setup, the release process can't not handle having multiple
>> versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>> reactor has duplicates and maven stops with an error.
>>
>> Looks like it is the release plugin.
>>
>> Whether this is because we are using an old(ish) apache parent or
>> whether it's an on-going problem, isn't clear yet.  Trying things out is
>> a slow process. Maybe a change of artifact name is needed, which itself
>> then needs checking in case that cascades in any way.  Or two build
>> cycles.
>>
>> Works:
>>    mvn -s settings.xml release:prepare -DdryRun=true
>> then fails
>>    mvn -s settings.xml release:prepare
>>
>> Bother.
>>
>> 	Andy
>>
>
>
>
>

Re: Release planning : 2.13.0

Posted by Rob Vesse <rv...@dotnetrdf.org>.

I think it is a more general limitation of Maven

Probably easiest thing is to call it jena-fuseki2 for the time being and
then at such time as 2.x is sufficiently stable to replace 1.x we can
rename again

Rob

On 28/02/2015 16:59, "Andy Seaborne" <an...@apache.org> wrote:

>There'll be a bit of a delay in building Jena 2.13.0.
>
>In our setup, the release process can't not handle having multiple
>versions of the same artifact (org.apache.jena:jena-fuseki).  Then the
>reactor has duplicates and maven stops with an error.
>
>Looks like it is the release plugin.
>
>Whether this is because we are using an old(ish) apache parent or
>whether it's an on-going problem, isn't clear yet.  Trying things out is
>a slow process. Maybe a change of artifact name is needed, which itself
>then needs checking in case that cascades in any way.  Or two build
>cycles.
>
>Works:
>   mvn -s settings.xml release:prepare -DdryRun=true
>then fails
>   mvn -s settings.xml release:prepare
>
>Bother.
>
>	Andy
>

Re: Release planning : 2.13.0

Posted by Andy Seaborne <an...@apache.org>.

There'll be a bit of a delay in building Jena 2.13.0.

In our setup, the release process can't not handle having multiple 
versions of the same artifact (org.apache.jena:jena-fuseki).  Then the 
reactor has duplicates and maven stops with an error.

Looks like it is the release plugin.

Whether this is because we are using an old(ish) apache parent or 
whether it's an on-going problem, isn't clear yet.  Trying things out is 
a slow process. Maybe a change of artifact name is needed, which itself 
then needs checking in case that cascades in any way.  Or two build cycles.

Works:
   mvn -s settings.xml release:prepare -DdryRun=true
then fails
   mvn -s settings.xml release:prepare

Bother.

	Andy

Re: Storing values in Lucene index

Posted by Andy Seaborne <an...@apache.org>.

On 27/02/15 17:09, Osma Suominen wrote:
> 27.02.2015, 18:06, Andy Seaborne wrote:
>
>>> This is inefficient if there happen to be lots of skos:altLabel values,
>>> as there are in e.g. AGROVOC thesaurus data.
>>
>> How many skos:altLabel can occur in that dataset?
>
> As an extreme example, <http://aims.fao.org/aos/agrovoc/c_1548> (the
> country Chile) has 433 altLabels. The typical case (if there's such a
> thing - it's probably a long tail distribution) is more like a dozen per
> concept. AGROVOC has terms in over 20 languages. Queries involving the
> literals tend to be a bit slow...
>
>> jena-text is a bit misnamed.  It's an entity index : "find subjects such
>> that ..."  Entity indexes make the conjunctive use cases work, "find
>> entities such that :property1 matches ... and :property2 matches ...".
>>
>> The example above is closer to a text index (query -> literal) LARQ
>> could do both in different configurations (not at the same time) through
>> people tended to use it as a text index and then look in the RDF to make
>> it an entity index.  It can't in a single call do the conjunctive use
>> case nor be particularly easy to manage specific properties in different
>> ways.
>>
>> I have come to realise that we might provide both kinds of index
>> separately.  A tightly managed literal-text-index could have deeper
>> integration into query processing e.g. FILTER expressions.
>
> I don't oppose, but I don't really follow either. Is there something
> fundamentally wrong with the (?s ?value) text:query 'blah' query style
> that I suggested? It's not like its unusual to store the actual values
> in a Lucene index... Lucene supports it (and Solr too), LARQ does it,
> many people do it. I understand that not all people will need it (and
> the associated size/performance costs), but it could be made optional.

I don't know if there is anything fundamentally wrong except the lack of 
conjunctive query.  A conjunctive expression can be on multiple aspects 
of an entity , multiple properties.

Example: storing (postal) addresses.  Then search on town name and 
street name in the same Lucene request.

One case possible (not often done) is that the RDF does not hold the 
literal at all.  e.g. The entity is a large text document; the RDF holds 
the metadata.

As mentioned, reclaiming from the text index isn't possible in any 
scheme that does not reference count the entries.

The property function style is a generative index - it produces matches. 
  It can be used less efficiently

But it's generality makes query planning hard.  A tightly couple index 
which was only indexing for literals, maybe storing them can have stats 
etc maintained.

You can't use that style the generative index in a filter

FILTER ( text:matches(?literal, 'lucene query') )

you can use it in that fashion with

?s :p ?literal .
(?x ?literal) text:query 'foo'

but the optimizer isn't going to reorder filters.

	Andy

>
> -Osma
>

Re: Storing values in Lucene index

Posted by Osma Suominen <os...@helsinki.fi>.

27.02.2015, 18:06, Andy Seaborne wrote:

>> This is inefficient if there happen to be lots of skos:altLabel values,
>> as there are in e.g. AGROVOC thesaurus data.
>
> How many skos:altLabel can occur in that dataset?

As an extreme example, <http://aims.fao.org/aos/agrovoc/c_1548> (the 
country Chile) has 433 altLabels. The typical case (if there's such a 
thing - it's probably a long tail distribution) is more like a dozen per 
concept. AGROVOC has terms in over 20 languages. Queries involving the 
literals tend to be a bit slow...

> jena-text is a bit misnamed.  It's an entity index : "find subjects such
> that ..."  Entity indexes make the conjunctive use cases work, "find
> entities such that :property1 matches ... and :property2 matches ...".
>
> The example above is closer to a text index (query -> literal) LARQ
> could do both in different configurations (not at the same time) through
> people tended to use it as a text index and then look in the RDF to make
> it an entity index.  It can't in a single call do the conjunctive use
> case nor be particularly easy to manage specific properties in different
> ways.
>
> I have come to realise that we might provide both kinds of index
> separately.  A tightly managed literal-text-index could have deeper
> integration into query processing e.g. FILTER expressions.

I don't oppose, but I don't really follow either. Is there something 
fundamentally wrong with the (?s ?value) text:query 'blah' query style 
that I suggested? It's not like its unusual to store the actual values 
in a Lucene index... Lucene supports it (and Solr too), LARQ does it, 
many people do it. I understand that not all people will need it (and 
the associated size/performance costs), but it could be made optional.

-Osma

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Storing values in Lucene index

Posted by Andy Seaborne <an...@apache.org>.

On 27/02/15 06:58, Osma Suominen wrote:
> On 26/02/15 18:37, Stephen Allen wrote:
>> I would propose in the future that we actual store and not
>> just index the document so that it can be appropriately identified and
>> deleted.  This would require a change to existing Lucene databases (we
>> should provide a tool to reindex existing data).  An alternative to
>> actually storing the value would be to generate a hash of the
>> subject+predicate+object and store that as an identifier.

The same literal may be in the RDF graph multiple times.  It's a 
reference counting problem; maintaining a reference would be expensive 
and limit scale.

> I second storing the original value in the Lucene index at least as an
> option - it would obviously increase the index size, though I suspect
> the increase would be rather minor if you compare it to the overall (TDB
> + text index) database size. This would be similar to how LARQ used to
> work, though LARQ only provides access to the values, not the subject
> resources.

Slight caveat - size of index affects the speed of Lucene so it's not 
just disk space compared to the size of the TDB database.

> It would allow, with some additional code, having access to the actual
> value from the SPARQL query. Something like this:
>
> (?s ?value) text:query 'word' .
>
> Then you could also easily check that the triple actually exists in
> current RDF data (and in the current graph), with a pattern such as this:
>
> ?s rdfs:label ?value .
>
>
> For me, it would probably allow some optimization of queries that
> currently have to do a bit of detective work to find out which value
> actually matched the query. I'm currently doing queries somewhat like this:
>
> ?s text:query (skos:altLabel 'word*') .
> ?s skos:altLabel ?value .
> FILTER (STRSTARTS(?value, 'word'))
>
> This is inefficient if there happen to be lots of skos:altLabel values,
> as there are in e.g. AGROVOC thesaurus data.

How many skos:altLabel can occur in that dataset?

------
jena-text is a bit misnamed.  It's an entity index : "find subjects such 
that ..."  Entity indexes make the conjunctive use cases work, "find 
entities such that :property1 matches ... and :property2 matches ...".

The example above is closer to a text index (query -> literal) LARQ 
could do both in different configurations (not at the same time) through 
people tended to use it as a text index and then look in the RDF to make 
it an entity index.  It can't in a single call do the conjunctive use 
case nor be particularly easy to manage specific properties in different 
ways.

I have come to realise that we might provide both kinds of index 
separately.  A tightly managed literal-text-index could have deeper 
integration into query processing e.g. FILTER expressions.

	Andy

>
> -Osma
>
>

Storing values in Lucene index Re: Release planning : 2.13.0

Posted by Osma Suominen <os...@helsinki.fi>.

On 26/02/15 18:37, Stephen Allen wrote:
> I would propose in the future that we actual store and not
> just index the document so that it can be appropriately identified and
> deleted.  This would require a change to existing Lucene databases (we
> should provide a tool to reindex existing data).  An alternative to
> actually storing the value would be to generate a hash of the
> subject+predicate+object and store that as an identifier.

I second storing the original value in the Lucene index at least as an 
option - it would obviously increase the index size, though I suspect 
the increase would be rather minor if you compare it to the overall (TDB 
+ text index) database size. This would be similar to how LARQ used to 
work, though LARQ only provides access to the values, not the subject 
resources.

It would allow, with some additional code, having access to the actual 
value from the SPARQL query. Something like this:

(?s ?value) text:query 'word' .

Then you could also easily check that the triple actually exists in 
current RDF data (and in the current graph), with a pattern such as this:

?s rdfs:label ?value .

For me, it would probably allow some optimization of queries that 
currently have to do a bit of detective work to find out which value 
actually matched the query. I'm currently doing queries somewhat like this:

?s text:query (skos:altLabel 'word*') .
?s skos:altLabel ?value .
FILTER (STRSTARTS(?value, 'word'))

This is inefficient if there happen to be lots of skos:altLabel values, 
as there are in e.g. AGROVOC thesaurus data.

-Osma

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Release planning : 2.13.0

Posted by Chris Dollin <ch...@epimorphics.com>.

On 02/26/2015 04:37 PM, Stephen Allen wrote:

> The conjunctive query idea of Chris' is also something I would look forward
> to.  It actually looks like I may have implemented a feature that Chris
> needed, the ability to specify a custom TextDocProducer.  Chris: I would be
> interested to see your approach for this.  Are you planning on waiting
> until all statements have been inserted then querying the RDF store to
> regenerate the documents for subjects that have been changed?  How do you
> handle triple deletion?

One of my colleagues  has reminded me that the pull request code doesn't
address JENA 686 itself but enables code that does do conjunctive query.
When the PR has been kicked into shape and installed into Jena (crosses
fingers) then I expect to push the conjunctive query components
(which are in a separate, Epimorphics, github repository).

Chris

-- 
Possibly you're not recalling some of his previous plans."      Zoe, /Firefly/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

Re: Release planning : 2.13.0

Posted by Stephen Allen <sa...@apache.org>.

On Wed, Feb 25, 2015 at 6:56 AM, Chris Dollin <ch...@epimorphics.com>
wrote:

> On 02/25/2015 11:30 AM, Andy Seaborne wrote:
>
>> Final call for Jena 2.13.0.
>>>>
>>>
> Stephen wrote:
>
>  I finished up and commited some outstanding changes I had for jena-text.
>>> I
>>> added the ability to specify an analyzer for the query text itself that
>>> was
>>> different than the one used for the document.  I also added some
>>> documentation explaining it on the site.
>>>
>>
>> Is there a JIRA for these changes?  I have only a superficial
>> understanding here
>> but is any of this  related to JENA-686?
>>
>> Stephen+Chris : maybe some discussion of plans and intentions on the dev@
>> list?
>>
>
> Sure. I have some notes about what the 686 changes are about I can
> transcribe. I have been making the (originally small) changes for
> 686 compatible with master and have (rightly or wrongly) been delaying
> discussion until I had something that seemed to be sound.
>
> Right Now I'm merging in the latest master changes and am expecting to
> make a pull request this PM.
>
> I'm guessing that it's unlikely the changes will be reviewed in time
> to make it into 2.13.0?
>
>
The query analyzer change is pretty separate from JENA-686, it just exposes
a capability that Lucene already has.  This is useful for example if you
are using the StandardAnalyzer to tokenize the stored document, but perhaps
you want to use one that tokenizes the query string differently.  You
already could do this with jena-text's Solr implementation, since the
configuration for that is controlled via the Solr config file.

The conjunctive query idea of Chris' is also something I would look forward
to.  It actually looks like I may have implemented a feature that Chris
needed, the ability to specify a custom TextDocProducer.  Chris: I would be
interested to see your approach for this.  Are you planning on waiting
until all statements have been inserted then querying the RDF store to
regenerate the documents for subjects that have been changed?  How do you
handle triple deletion?

I implemented the custom TextDocProducer for a slightly different reason,
which was to handle triple deletions and remove the document from the
lucene index.  However, my triple deletion code is kind of a hack (I am
only currently indexing rdfs:label, and my application enforces a
cardinality of 1 for that property, so I can just delete all documents with
a given subject and predicate).  The index does not actually keep the value
of the document, it only indexes it, so this solution would not work in the
general case.  I would propose in the future that we actual store and not
just index the document so that it can be appropriately identified and
deleted.  This would require a change to existing Lucene databases (we
should provide a tool to reindex existing data).  An alternative to
actually storing the value would be to generate a hash of the
subject+predicate+object and store that as an identifier.

Chris, I see in the JIRA that you talk about committing work to a branch,
but I can't seem to locate it.  Is this in github somewhere?

-Stephen

Re: Release planning : 2.13.0

Posted by Chris Dollin <ch...@epimorphics.com>.

On 02/25/2015 11:30 AM, Andy Seaborne wrote:
>>> Final call for Jena 2.13.0.

Stephen wrote:

>> I finished up and commited some outstanding changes I had for jena-text.  I
>> added the ability to specify an analyzer for the query text itself that was
>> different than the one used for the document.  I also added some
>> documentation explaining it on the site.
>
> Is there a JIRA for these changes?  I have only a superficial understanding here
> but is any of this  related to JENA-686?
>
> Stephen+Chris : maybe some discussion of plans and intentions on the dev@ list?

Sure. I have some notes about what the 686 changes are about I can
transcribe. I have been making the (originally small) changes for
686 compatible with master and have (rightly or wrongly) been delaying
discussion until I had something that seemed to be sound.

Right Now I'm merging in the latest master changes and am expecting to
make a pull request this PM.

I'm guessing that it's unlikely the changes will be reviewed in time
to make it into 2.13.0?

Chris

-- 
"Possibly you're not recalling some of his previous plans."      Zoe, /Firefly/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)