You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2011/10/09 23:26:04 UTC

Roadmap? (Was: Re: Site: some trivial changes and a proposal to add "getting involved" to the horizontal navigation bar)

Hi Ian

Ian Dickinson wrote:
> On 08/10/11 20:38, Paolo Castagna wrote:
>> Hi,
>> I made some small and trivial changes to the Jena website (without
>> publishing it).
>> Feel free to change/reject any of them, if you do not agree with the
>> changes.
>>
>>
>> The page which makes me more unsatisfied is the "Jena Roadmap" (*):
> Well, since you were the main person advocating it, I assumed that you
> would fill in some content. Not quite sure how we go about creating a
> consensus roadmap though.

>From my (and Talis) point of view:

 - scalability and performance (loading and querying)
 - ease of use (Fuseki did a great job in that direction)
 - better modularity and a small Jena core module
 - ability to create and keep up to date custom indexes (a la LARQ)
 - authentication and authorization (added to Fuseki?)
 - high availability (a simple master/slave replication solution would do)
 - a good solution for "geo" and SPARQL
 - TxTDB (ongoing) not to block reads while write transactions are in progress
 - a scalable inference engine(?)
 - ...

If any of these is not self-explanatory, please, say so and I can add
more details, rationale and use cases.

But, as you say it's important to create consensus around these.

To me, consensus come from the every day needs.

We need the things listed above to operate our services (and they are
probably shared by anyone who would want to use Apache Jena in solutions
used in a production environment).

>> http://jena.staging.apache.org/jena/about_jena/roadmap.html
>> I believe it's important to give people a sense of direction.
>> The current page IMHO, perhaps, send the wrong message (i.e. no
>> direction).
> OK, but what would you say that the project's direction is?

Make Apache Jena a solid and scalable library (and server) for everyone
wanting to build (and run) solutions or services using RDF (including,
but not limited to, people and companies (who might be running businesses
with it)).

Then, on top of this, there are the everyday problems.

For example, tomorrow, I need to load 2 billion triples|quads in TDB.
How am I going to deal with that? ;-)

Paolo

Re: Roadmap?

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Andy

Andy Seaborne wrote:
> On 12/10/11 18:05, Paolo Castagna wrote:
>> Andy Seaborne wrote:
>>> On 09/10/11 22:26, Paolo Castagna wrote:
>>>>    From my (and Talis) point of view:
>>>>
>>>>    - scalability and performance (loading and querying)
>>
>> Something like this:
>> https://issues.apache.org/jira/browse/JENA-140
>> ... goes below "scalability and performance". :-)
> 
> I don't see how copying a rambling discussion email helps.  All it does
> for me is to inhibit discussion of this or any other idea on the mailing
> list.

I don't understand why having a JIRA open classified at "Wish" inhibit discussion.
It is a place where discussion can happen. Past messages in the dev mailing lists
tend to be forgotten and people can missed them.

Could you please explain better why a JIRA issue classified as "Wish" inhibit
discussion?

I'll delete the issue if that is a way to continue the discussion on caching
for SPARQL endpoint, but first, I'd like to understand why it inhibits it.

I certainly I am the last person who wants to inhibit discussion on anything,
especially on a caching layer for SPARQL endpoints, especially on something who
is going to improve performances or scalability. :-)

> 
>> Should we add a "caching layer for SPARQL endpoints" to our roadmap page?
> 
> There's only one goal on *my* roadmap :
> 
> * Graduate from incubator.

We should have that as well on the Roadmap.
And, we should have JIRA issues open related to that.

JIRA issues such as: "Rolling out the first Apache release" with sub-task
help others to see what's missing and those who are keen to help, do so.

Paolo

> 
>     Andy

Re: Roadmap?

Posted by Andy Seaborne <an...@apache.org>.

On 12/10/11 18:05, Paolo Castagna wrote:
> Andy Seaborne wrote:
>> On 09/10/11 22:26, Paolo Castagna wrote:
>>>    From my (and Talis) point of view:
>>>
>>>    - scalability and performance (loading and querying)
>
> Something like this:
> https://issues.apache.org/jira/browse/JENA-140
> ... goes below "scalability and performance". :-)

I don't see how copying a rambling discussion email helps.  All it does 
for me is to inhibit discussion of this or any other idea on the mailing 
list.

> Should we add a "caching layer for SPARQL endpoints" to our roadmap page?

There's only one goal on *my* roadmap :

* Graduate from incubator.

	Andy

Re: Roadmap?

Posted by Paolo Castagna <ca...@googlemail.com>.

Andy Seaborne wrote:
> On 09/10/11 22:26, Paolo Castagna wrote:
>>  From my (and Talis) point of view:
>>
>>   - scalability and performance (loading and querying)

Something like this:
https://issues.apache.org/jira/browse/JENA-140
... goes below "scalability and performance". :-)

Should we add a "caching layer for SPARQL endpoints" to our roadmap page?

Paolo

>>   - ease of use (Fuseki did a great job in that direction)
>>   - better modularity and a small Jena core module
>>   - ability to create and keep up to date custom indexes (a la LARQ)
>>   - authentication and authorization (added to Fuseki?)
>>   - high availability (a simple master/slave replication solution
>> would do)
>>   - a good solution for "geo" and SPARQL
>>   - TxTDB (ongoing) not to block reads while write transactions are in
>> progress
>>   - a scalable inference engine(?)
>>   - ...
> 
> There are three axes: time, resources and functionality.  It's not a
> free choice: choose two and the third is determined.  Other limitations
> may apply.
> 
> Making tiny incremental steps on too many fronts doesn't lead to real
> progress.
> 
> What are the priorities for Talis and how much time&resource is Talis
> prepared to contribute to each priority?
> 
>     Andy

Re: Roadmap?

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Andy

Andy Seaborne wrote:
> On 09/10/11 22:26, Paolo Castagna wrote:
>>  From my (and Talis) point of view:
>>
>>   - scalability and performance (loading and querying)
>>   - ease of use (Fuseki did a great job in that direction)
>>   - better modularity and a small Jena core module
>>   - ability to create and keep up to date custom indexes (a la LARQ)
>>   - authentication and authorization (added to Fuseki?)
>>   - high availability (a simple master/slave replication solution
>> would do)
>>   - a good solution for "geo" and SPARQL
>>   - TxTDB (ongoing) not to block reads while write transactions are in
>> progress
>>   - a scalable inference engine(?)
>>   - ...
> 
> There are three axes: time, resources and functionality.  It's not a
> free choice: choose two and the third is determined.  Other limitations
> may apply.

The roadmap isn't a definitive commitment and it does not necessarily need
to explicitly include (on the HTML page) considerations on time or resources.

The current page already lists a couple of desired or expected, but are not
yet underway items:

 - OWL 2 support in ontology API
 - migrate package names from com.hp to org.apache

"Not yet underway" does not create expectations (which is good), but it
communicate that those are desired features/additions.

IMHO, a couple of good examples of project roadmaps are:

 - https://cwiki.apache.org/confluence/display/WHIRR/RoadMap
 - http://code.google.com/p/google-refine/wiki/Roadmap

There are plenty, I find them useful.

Some projects communicate progress using roadmaps in JIRA:

 - https://issues.apache.org/jira/browse/WHIRR#selectedTab=com.atlassian.jira.plugin.system.project%3Aroadmap-panel

I find that useful too for the projects I use/depend on.

It quickly and clearly gives me a rough sense on how far off is the next
release. What should I expect in it.

It also gives hint to people who might want to help moving faster towards
a release on where their help is needed most.

Last but not least, it gives people a sense of direction.

> 
> Making tiny incremental steps on too many fronts doesn't lead to real
> progress.

I agree.

> 
> What are the priorities for Talis and how much time&resource is Talis
> prepared to contribute to each priority?

It's not just about Talis.
It's about Apache Jena, independently from Talis.

For example, a few items I mentioned in the list above are not strictly
speaking needed by Talis in the short or mid-term future:

 - ability to create and keep up to date custom indexes (a la LARQ)
 - authentication and authorization (added to Fuseki?)
 - high availability (a simple master/slave replication solution would do)

We did not found a solution for these problems out-of-the box and we needed
to develop our own.

However, I imagine that most of the people wanting to use Fuseki in production
would need think about how to secure access to their data or how to guarantee
high availability (or improve/increase availability).

A first step could be to agree on a list.

Then we could divide the items of that list into two groups: short/mid-terms
and long/distant future.

An example of item I would include in the short/mid-term is:

 - migrate package names from com.hp to org.apache

>From a point of view of Talis: scalability (*) and performances of TDB and ARQ
together with transactions (now that TxTDB became TDB) is certainly on top
of the list.

Something I am personally interested in and keen to do is:

 - ability to create and keep up to date custom indexes (a la LARQ)
 - a good solution for "geo" and SPARQL

Paolo

PS:
(*) I still have 1 billion triples|quads dataset to load into TDB.

> 
>     Andy

Re: Roadmap?

Posted by Andy Seaborne <an...@apache.org>.

On 09/10/11 22:26, Paolo Castagna wrote:
>  From my (and Talis) point of view:
>
>   - scalability and performance (loading and querying)
>   - ease of use (Fuseki did a great job in that direction)
>   - better modularity and a small Jena core module
>   - ability to create and keep up to date custom indexes (a la LARQ)
>   - authentication and authorization (added to Fuseki?)
>   - high availability (a simple master/slave replication solution would do)
>   - a good solution for "geo" and SPARQL
>   - TxTDB (ongoing) not to block reads while write transactions are in progress
>   - a scalable inference engine(?)
>   - ...

There are three axes: time, resources and functionality.  It's not a 
free choice: choose two and the third is determined.  Other limitations 
may apply.

Making tiny incremental steps on too many fronts doesn't lead to real 
progress.

What are the priorities for Talis and how much time&resource is Talis 
prepared to contribute to each priority?

	Andy