You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@rya.apache.org by Brad Rushworth <br...@remote.com.au> on 2020/07/06 09:12:24 UTC

Rya Roadmap

Hi everyone!

My name is Brad and I'm based in Australia. I've been developing Rya for 
a few months now full-time as part of a comprehensive evaluation of 
Semantic Web technologies and in particular Rya, for our organisation. 
We're experienced users of Accumulo. We had some policy issues to 
overcome in regards to contributing to the Apache project but that is 
now resolved. I've been in contact with Adina along the way.

Rya seems pretty awesome, but it is held back by a lack of 
documentation, some unclean code and a few rough edges to getting 
started. For example, we could hook it up with Fluo Muchos to make it 
super easy for new people to spin up a working Rya cluster on an AWS or 
Azure cloud. My impression of Rya is that it is quite feature complete, 
but needs some work to be much more friendly to new adopters.

I put up a pull request last week that updated the maven dependencies of 
the project. Any help reviewing that would be appreciated. I know you're 
all busy so there is no great rush, but I'd love to collaborate and hear 
your priorities too.

I'm about 70 commits deep into my work on Rya in our organisation's code 
repository, so I've been pretty busy. I'm now trying to finalise some 
changes. I've been testing the performance against the original code in 
a small test cluster, and for some queries I've made Rya much faster, 
and for others, slower. I'm working on more changes which I think should 
improve it further. I've started testing against the LUBM 5000 dataset, 
DBPedia and OpenPermID.

I'm new to the world of Semantic Web but fortunately I have some 
experienced colleagues helping me along the way. I've been marking 
tickets in Jira as a work on them, and I'm trying to publish my pull 
requests onto GitHub faster. Hopefully a bunch will start appearing soon.

Please expect a large pull request soon that changes Rya to use data 
types that align better with RDF4J, but otherwise doesn't change 
functionality. I have a refactor of the Accumulo DAO that is cleaner and 
(once finished hopefully much) faster. I have fixed a number of other 
tickets and improved some of the doco and configuration files. I'll try 
to make the pull requests clean and reviewable, but unfortunately many 
of the improvements I'm making depend on other improvements I've made, 
so its a bit tricky to disentangle.

Some improvements I'll be putting up shortly also include:
Enhance accumulo.rya to support the use of bloom filter
Make timeout for SPARQL query configurable
Add an IPAddressRyaTypeResolver
NumberFormatException for large integers
Tomcat configuration for indexers
etc

If anyone with more Rya experience wants to request particular features 
or functionality to be worked on, I've love to heard from you. We're 
particularly interested in scaling Rya to very large data sets (thus 
performance is very important to us) and making Rya more generic in 
reading from other (pre-existing) Accumulo table layouts. I also want to 
fix reliability issues around indexing configuration and consistency of 
tables (for example, is there a mapreduce job that repairs the indexes 
if data is written from a misconfigured client?).

I hope to hear from you, and your thoughts on the future directions of Rya.

Brad

Re: Rya Roadmap

Posted by Brad Rushworth <br...@remote.com.au>.

Hi everyone,

I've created a draft PR:
https://github.com/apache/rya/pull/317

I'm after some background, opinions and feedback about how to improve 
the environment.properties space in Rya.

I'm not really clear where this should end up, so please let me know 
your thoughts.

Brad


On 6/07/2020 7:12 pm, Brad Rushworth wrote:
> Hi everyone!
>
> My name is Brad and I'm based in Australia. I've been developing Rya for
> a few months now full-time as part of a comprehensive evaluation of
> Semantic Web technologies and in particular Rya, for our organisation.
> We're experienced users of Accumulo. We had some policy issues to
> overcome in regards to contributing to the Apache project but that is
> now resolved. I've been in contact with Adina along the way.
>
> Rya seems pretty awesome, but it is held back by a lack of
> documentation, some unclean code and a few rough edges to getting
> started. For example, we could hook it up with Fluo Muchos to make it
> super easy for new people to spin up a working Rya cluster on an AWS or
> Azure cloud. My impression of Rya is that it is quite feature complete,
> but needs some work to be much more friendly to new adopters.
>
> I put up a pull request last week that updated the maven dependencies of
> the project. Any help reviewing that would be appreciated. I know you're
> all busy so there is no great rush, but I'd love to collaborate and hear
> your priorities too.
>
> I'm about 70 commits deep into my work on Rya in our organisation's code
> repository, so I've been pretty busy. I'm now trying to finalise some
> changes. I've been testing the performance against the original code in
> a small test cluster, and for some queries I've made Rya much faster,
> and for others, slower. I'm working on more changes which I think should
> improve it further. I've started testing against the LUBM 5000 dataset,
> DBPedia and OpenPermID.
>
> I'm new to the world of Semantic Web but fortunately I have some
> experienced colleagues helping me along the way. I've been marking
> tickets in Jira as a work on them, and I'm trying to publish my pull
> requests onto GitHub faster. Hopefully a bunch will start appearing soon.
>
> Please expect a large pull request soon that changes Rya to use data
> types that align better with RDF4J, but otherwise doesn't change
> functionality. I have a refactor of the Accumulo DAO that is cleaner and
> (once finished hopefully much) faster. I have fixed a number of other
> tickets and improved some of the doco and configuration files. I'll try
> to make the pull requests clean and reviewable, but unfortunately many
> of the improvements I'm making depend on other improvements I've made,
> so its a bit tricky to disentangle.
>
> Some improvements I'll be putting up shortly also include:
> Enhance accumulo.rya to support the use of bloom filter
> Make timeout for SPARQL query configurable
> Add an IPAddressRyaTypeResolver
> NumberFormatException for large integers
> Tomcat configuration for indexers
> etc
>
> If anyone with more Rya experience wants to request particular features
> or functionality to be worked on, I've love to heard from you. We're
> particularly interested in scaling Rya to very large data sets (thus
> performance is very important to us) and making Rya more generic in
> reading from other (pre-existing) Accumulo table layouts. I also want to
> fix reliability issues around indexing configuration and consistency of
> tables (for example, is there a mapreduce job that repairs the indexes
> if data is written from a misconfigured client?).
>
> I hope to hear from you, and your thoughts on the future directions of Rya.
>
> Brad
>

Re: Rya Roadmap

Posted by Christopher <ct...@apache.org>.

Hi Brad,

That seems like good stuff.

I just want to add a caveat for Muchos, because you mentioned it:
Muchos is not a released ASF product, and has not met the normal
standards of release (such as having been vetted and voted on by the
Fluo PMC). It is primarily used internal to the Fluo (and Accumulo)
committers to aid in development, but should not be recommended or
promoted outside the development communities until it it receives an
official release, as per ASF standards and norms.

That said, there have been suggestions here and there to do a vote for
an official release of Muchos, but it has not yet happened.

Christopher
(Fluo PMC member)

On Mon, Jul 6, 2020 at 12:32 PM David Lotts <dl...@gmail.com> wrote:
>
> Wow, thank you Brad!  Your dependency upgrade PR represents efforts that
> are career questioningly tedious, and yet immensely vital to our project's
> progress!  Thank you!  I am looking through all the changes now.
>
> > If anyone with more Rya experience wants to request particular features
> > or functionality to be worked on, I've love to hear from you
> Of course you know you can find lots of features requests in jira.  Wishes
> sort first on "issue Type":
> [
> https://issues.apache.org/jira/projects/RYA/issues/RYA-38?filter=allopenissues&orderby=issuetype+DESC%2C+cf%5B12310090%5D+ASC%2C+cf%5B12310220%5D+ASC%2C+priority+DESC%2C+updated+DESC
> ]
>
>
> Glad to have you on board as a contributor!  Looking forward to seeing what
> else you have in the works.
> david.

Re: Rya Roadmap

Posted by David Lotts <dl...@gmail.com>.

Wow, thank you Brad!  Your dependency upgrade PR represents efforts that
are career questioningly tedious, and yet immensely vital to our project's
progress!  Thank you!  I am looking through all the changes now.

> If anyone with more Rya experience wants to request particular features
> or functionality to be worked on, I've love to hear from you
Of course you know you can find lots of features requests in jira.  Wishes
sort first on "issue Type":
[
https://issues.apache.org/jira/projects/RYA/issues/RYA-38?filter=allopenissues&orderby=issuetype+DESC%2C+cf%5B12310090%5D+ASC%2C+cf%5B12310220%5D+ASC%2C+priority+DESC%2C+updated+DESC
]


Glad to have you on board as a contributor!  Looking forward to seeing what
else you have in the works.
david.