You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@rya.apache.org by Boris Pelakh <bo...@semanticarts.com> on 2018/10/30 21:15:23 UTC

Free text indexing, using Prospector to improve queries.

I have continued my experiment with single-node Rya, and have a couple of questions, hoping for some guidance before I digging in the code:

1. I am still trying to turn on free text indexing. I have added the following to my environment.properties

sc.use_freetext=true
sc.freetext.predicates=http://www.w3.org/2000/01/rdf-schema#label

But the index tables were not created on restart or populated with data on INSERT.
2. Once I do manage to populate the data, I remember I had to use a special FILTER function, something like fts:search, to hit the index. Furthermore, a special Accumulo batch scanner had to be deployed into the tablet server. Are there instructions anywhere detailing these steps?
3. I have run the Prospector MR job and my rya_prospects table has been populated with frequency counts on my various literals. How do I direct Rya to take advantage of this information during query planning? Is there another environment setting, or do I specify something per-query? The information does not appear to be used out of the box.

I was also curious if there is any sort of work in progress with Rya, an overall development plan? For starters, upgrading to Accumulo 2.x/Hadoop 3.x (assuming A 2.x gets out of beta soon), but also general improvements, such as creating a fully compliant RESTful endpoint, etc. The company I am working with now might be able to underwrite some of this effort, but it would be better if there was a shared vision that everyone could agree on.

Thanks,
Boris