You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by xristy <gi...@git.apache.org> on 2017/12/28 19:24:29 UTC
[GitHub] jena pull request #335: Jena 1453 reduce docs
GitHub user xristy opened a pull request:
https://github.com/apache/jena/pull/335
Jena 1453 reduce docs
resolves Jena 1453.
Removes redundant fields from Lucene documents.
Adds graph output arg to TextQueryPF
Adds unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BuddhistDigitalResourceCenter/jena JENA-1453-reduce-docs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/335.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #335
----
commit 2dc5008bfb0f481549137ee41f151a38ebf4e04b
Author: Chris Tomlinson <ct...@...>
Date: 2017-12-27T17:03:18Z
resolve JENA 1453 ; reduce Lucene docs; add graph output arg
commit 66c842db42e42687beb8e2dad817bc605546d62f
Author: Chris Tomlinson <ct...@...>
Date: 2017-12-28T19:19:46Z
added unit tests for jena-text and graphs
----
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:
https://github.com/apache/jena/pull/335
Presumably this was ready to merge. I've noted the announcement text as well, thanks, and will sort out the documentation (unless someone beats me to it).
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
Yes it was ready to merge. The documentation updates are queued in the anonymous "improve this page" commit that I made a week ago.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:
https://github.com/apache/jena/pull/335
Does this change the on-disk format? I do't think it does but confirmation would be good.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:
https://github.com/apache/jena/pull/335
Documentation changes applied. Thanks!
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
Happy to. Here's a brief statement:
This release includes updates to the Jena Lucene integration that reduces the size of the documents indexed by Lucene and reduces the size of the resulting indexes. Re-indexing is not necessary as the changes are compatible with existing indexes. Additionally, there is an optional output argument for `text:query` that allows to retrieve the graph that contains a result triple. See the updated [jena text documentation](http://jena.apache.org/documentation/query/text-query.html).
---
[GitHub] jena pull request #335: Jena 1453 reduce docs
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/jena/pull/335
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:
https://github.com/apache/jena/pull/335
This is a naive question, so feel free to answer it as briefly as it deserves, but since for some instances reindexing may be a really considerable amount of work, would it be useful/possible to include a very simple migrator class that can strip out the no-longer-used fields and produce a new leaner text index that could then be swapped in?
---
[GitHub] jena pull request #335: Jena 1453 reduce docs
Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/335#discussion_r159427449
--- Diff: jena-text/src/test/java/org/apache/jena/query/text/TestTextGraphIndexExtra2.java ---
@@ -0,0 +1,335 @@
+/**
--- End diff --
Minor - license headers aren't javadoc so they should be `/*` not `/**`
The codebase all over the place seem to have quite a few `/**` - not just this PR.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
@ajs6f I haven't thought about trying to do an in-place update of a text index; however, perhaps one could use jena/textindexer with an assembler file modified to create a Lucene index off-to-the-side and then halt the main server and swap in the new index.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
Assuming the PR is accepted I'll update the jena-text doc to reflect that there is an additional output arg for `text:query`. An example is:
select ?g ?s ?lit ?sc
where {
(?s ?sc ?lit ?g) text:query (skos:altLabel "one" 100 "lang:en") .
}
where the `?g` reports the graph in which the matching triples occur. This is likely to be rather more performant than iterating over all graphs or collecting the graph URIs after the fact.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
Ah! I should have written on this.
Upgrading to this PR does not affect an existing text index.
The changes to the Lucene documents that are indexed will affect triples that are added - they will have fewer fields and if the graph field is enabled then a single stored graph field will be present rather than several instances.
This PR removes redundant fields or unreferenced fields when indexing new triple documents.
The triple/document deletion functionality will behave as before if it was enabled when the text index was created.
The graph return feature will function with older indexes provided that the graph field was enabled when the text index was created.
Re-indexing should generally reduce the size of the text index.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by ajs6f <gi...@git.apache.org>.
Github user ajs6f commented on the issue:
https://github.com/apache/jena/pull/335
@xristy That's what I meant-- I didn't mean to suggest an in-place op, sorry.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by xristy <gi...@git.apache.org>.
Github user xristy commented on the issue:
https://github.com/apache/jena/pull/335
I've submitted an update to the jena-text documentation to reflect the graph output argument for `text:query`.
---
[GitHub] jena issue #335: Jena 1453 reduce docs
Posted by afs <gi...@git.apache.org>.
Github user afs commented on the issue:
https://github.com/apache/jena/pull/335
A few words for the release announcement, especially about non needing to rebuild indexes but if you do, hey are smaller would be most helpful.
---