You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rya.apache.org by Brad Rushworth <br...@remote.com.au> on 2020/07/06 09:12:24 UTC
Rya Roadmap
Hi everyone!
My name is Brad and I'm based in Australia. I've been developing Rya for
a few months now full-time as part of a comprehensive evaluation of
Semantic Web technologies and in particular Rya, for our organisation.
We're experienced users of Accumulo. We had some policy issues to
overcome in regards to contributing to the Apache project but that is
now resolved. I've been in contact with Adina along the way.
Rya seems pretty awesome, but it is held back by a lack of
documentation, some unclean code and a few rough edges to getting
started. For example, we could hook it up with Fluo Muchos to make it
super easy for new people to spin up a working Rya cluster on an AWS or
Azure cloud. My impression of Rya is that it is quite feature complete,
but needs some work to be much more friendly to new adopters.
I put up a pull request last week that updated the maven dependencies of
the project. Any help reviewing that would be appreciated. I know you're
all busy so there is no great rush, but I'd love to collaborate and hear
your priorities too.
I'm about 70 commits deep into my work on Rya in our organisation's code
repository, so I've been pretty busy. I'm now trying to finalise some
changes. I've been testing the performance against the original code in
a small test cluster, and for some queries I've made Rya much faster,
and for others, slower. I'm working on more changes which I think should
improve it further. I've started testing against the LUBM 5000 dataset,
DBPedia and OpenPermID.
I'm new to the world of Semantic Web but fortunately I have some
experienced colleagues helping me along the way. I've been marking
tickets in Jira as a work on them, and I'm trying to publish my pull
requests onto GitHub faster. Hopefully a bunch will start appearing soon.
Please expect a large pull request soon that changes Rya to use data
types that align better with RDF4J, but otherwise doesn't change
functionality. I have a refactor of the Accumulo DAO that is cleaner and
(once finished hopefully much) faster. I have fixed a number of other
tickets and improved some of the doco and configuration files. I'll try
to make the pull requests clean and reviewable, but unfortunately many
of the improvements I'm making depend on other improvements I've made,
so its a bit tricky to disentangle.
Some improvements I'll be putting up shortly also include:
Enhance accumulo.rya to support the use of bloom filter
Make timeout for SPARQL query configurable
Add an IPAddressRyaTypeResolver
NumberFormatException for large integers
Tomcat configuration for indexers
etc
If anyone with more Rya experience wants to request particular features
or functionality to be worked on, I've love to heard from you. We're
particularly interested in scaling Rya to very large data sets (thus
performance is very important to us) and making Rya more generic in
reading from other (pre-existing) Accumulo table layouts. I also want to
fix reliability issues around indexing configuration and consistency of
tables (for example, is there a mapreduce job that repairs the indexes
if data is written from a misconfigured client?).
I hope to hear from you, and your thoughts on the future directions of Rya.
Brad
Re: Rya Roadmap
Posted by Brad Rushworth <br...@remote.com.au>.
Hi everyone,
I've created a draft PR:
https://github.com/apache/rya/pull/317
I'm after some background, opinions and feedback about how to improve
the environment.properties space in Rya.
I'm not really clear where this should end up, so please let me know
your thoughts.
Brad
On 6/07/2020 7:12 pm, Brad Rushworth wrote:
> Hi everyone!
>
> My name is Brad and I'm based in Australia. I've been developing Rya for
> a few months now full-time as part of a comprehensive evaluation of
> Semantic Web technologies and in particular Rya, for our organisation.
> We're experienced users of Accumulo. We had some policy issues to
> overcome in regards to contributing to the Apache project but that is
> now resolved. I've been in contact with Adina along the way.
>
> Rya seems pretty awesome, but it is held back by a lack of
> documentation, some unclean code and a few rough edges to getting
> started. For example, we could hook it up with Fluo Muchos to make it
> super easy for new people to spin up a working Rya cluster on an AWS or
> Azure cloud. My impression of Rya is that it is quite feature complete,
> but needs some work to be much more friendly to new adopters.
>
> I put up a pull request last week that updated the maven dependencies of
> the project. Any help reviewing that would be appreciated. I know you're
> all busy so there is no great rush, but I'd love to collaborate and hear
> your priorities too.
>
> I'm about 70 commits deep into my work on Rya in our organisation's code
> repository, so I've been pretty busy. I'm now trying to finalise some
> changes. I've been testing the performance against the original code in
> a small test cluster, and for some queries I've made Rya much faster,
> and for others, slower. I'm working on more changes which I think should
> improve it further. I've started testing against the LUBM 5000 dataset,
> DBPedia and OpenPermID.
>
> I'm new to the world of Semantic Web but fortunately I have some
> experienced colleagues helping me along the way. I've been marking
> tickets in Jira as a work on them, and I'm trying to publish my pull
> requests onto GitHub faster. Hopefully a bunch will start appearing soon.
>
> Please expect a large pull request soon that changes Rya to use data
> types that align better with RDF4J, but otherwise doesn't change
> functionality. I have a refactor of the Accumulo DAO that is cleaner and
> (once finished hopefully much) faster. I have fixed a number of other
> tickets and improved some of the doco and configuration files. I'll try
> to make the pull requests clean and reviewable, but unfortunately many
> of the improvements I'm making depend on other improvements I've made,
> so its a bit tricky to disentangle.
>
> Some improvements I'll be putting up shortly also include:
> Enhance accumulo.rya to support the use of bloom filter
> Make timeout for SPARQL query configurable
> Add an IPAddressRyaTypeResolver
> NumberFormatException for large integers
> Tomcat configuration for indexers
> etc
>
> If anyone with more Rya experience wants to request particular features
> or functionality to be worked on, I've love to heard from you. We're
> particularly interested in scaling Rya to very large data sets (thus
> performance is very important to us) and making Rya more generic in
> reading from other (pre-existing) Accumulo table layouts. I also want to
> fix reliability issues around indexing configuration and consistency of
> tables (for example, is there a mapreduce job that repairs the indexes
> if data is written from a misconfigured client?).
>
> I hope to hear from you, and your thoughts on the future directions of Rya.
>
> Brad
>
Re: Rya Roadmap
Posted by Christopher <ct...@apache.org>.
Hi Brad,
That seems like good stuff.
I just want to add a caveat for Muchos, because you mentioned it:
Muchos is not a released ASF product, and has not met the normal
standards of release (such as having been vetted and voted on by the
Fluo PMC). It is primarily used internal to the Fluo (and Accumulo)
committers to aid in development, but should not be recommended or
promoted outside the development communities until it it receives an
official release, as per ASF standards and norms.
That said, there have been suggestions here and there to do a vote for
an official release of Muchos, but it has not yet happened.
Christopher
(Fluo PMC member)
On Mon, Jul 6, 2020 at 12:32 PM David Lotts <dl...@gmail.com> wrote:
>
> Wow, thank you Brad! Your dependency upgrade PR represents efforts that
> are career questioningly tedious, and yet immensely vital to our project's
> progress! Thank you! I am looking through all the changes now.
>
> > If anyone with more Rya experience wants to request particular features
> > or functionality to be worked on, I've love to hear from you
> Of course you know you can find lots of features requests in jira. Wishes
> sort first on "issue Type":
> [
> https://issues.apache.org/jira/projects/RYA/issues/RYA-38?filter=allopenissues&orderby=issuetype+DESC%2C+cf%5B12310090%5D+ASC%2C+cf%5B12310220%5D+ASC%2C+priority+DESC%2C+updated+DESC
> ]
>
>
> Glad to have you on board as a contributor! Looking forward to seeing what
> else you have in the works.
> david.
Re: Rya Roadmap
Posted by David Lotts <dl...@gmail.com>.
Wow, thank you Brad! Your dependency upgrade PR represents efforts that
are career questioningly tedious, and yet immensely vital to our project's
progress! Thank you! I am looking through all the changes now.
> If anyone with more Rya experience wants to request particular features
> or functionality to be worked on, I've love to hear from you
Of course you know you can find lots of features requests in jira. Wishes
sort first on "issue Type":
[
https://issues.apache.org/jira/projects/RYA/issues/RYA-38?filter=allopenissues&orderby=issuetype+DESC%2C+cf%5B12310090%5D+ASC%2C+cf%5B12310220%5D+ASC%2C+priority+DESC%2C+updated+DESC
]
Glad to have you on board as a contributor! Looking forward to seeing what
else you have in the works.
david.