You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by xristy <gi...@git.apache.org> on 2017/12/28 19:24:29 UTC

[GitHub] jena pull request #335: Jena 1453 reduce docs

GitHub user xristy opened a pull request:

    https://github.com/apache/jena/pull/335

    Jena 1453 reduce docs

    resolves Jena 1453. 
    
    Removes redundant fields from Lucene documents.
    
    Adds graph output arg to TextQueryPF
    
    Adds unit tests. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BuddhistDigitalResourceCenter/jena JENA-1453-reduce-docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #335
    
----
commit 2dc5008bfb0f481549137ee41f151a38ebf4e04b
Author: Chris Tomlinson <ct...@...>
Date:   2017-12-27T17:03:18Z

    resolve JENA 1453 ; reduce Lucene docs; add graph output arg

commit 66c842db42e42687beb8e2dad817bc605546d62f
Author: Chris Tomlinson <ct...@...>
Date:   2017-12-28T19:19:46Z

    added unit tests for jena-text and graphs

----


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Presumably this was ready to merge. I've noted the announcement text as well, thanks, and will sort out the documentation (unless someone beats me to it).
    



---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Yes it was ready to merge. The documentation updates are queued in the anonymous "improve this page" commit that I made a week ago.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Does this change the on-disk format? I do't think it does but confirmation would be good.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Documentation changes applied. Thanks!


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Happy to. Here's a brief statement:
    
    This release includes updates to the Jena Lucene integration that reduces the size of the documents indexed by Lucene and reduces the size of the resulting indexes. Re-indexing is not necessary as the changes are compatible with existing indexes. Additionally, there is an optional output argument for `text:query` that allows to retrieve the graph that contains a result triple. See the updated [jena text documentation](http://jena.apache.org/documentation/query/text-query.html).


---

[GitHub] jena pull request #335: Jena 1453 reduce docs

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/335


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by ajs6f <gi...@git.apache.org>.

Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/335
  
    This is a naive question, so feel free to answer it as briefly as it deserves, but since for some instances reindexing may be a really considerable amount of work, would it be useful/possible to include a very simple migrator class that can strip out the no-longer-used fields and produce a new leaner text index that could then be swapped in?


---

[GitHub] jena pull request #335: Jena 1453 reduce docs

Posted by afs <gi...@git.apache.org>.

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/335#discussion_r159427449
  
    --- Diff: jena-text/src/test/java/org/apache/jena/query/text/TestTextGraphIndexExtra2.java ---
    @@ -0,0 +1,335 @@
    +/**
    --- End diff --
    
    Minor - license headers aren't javadoc so they should be `/*` not `/**`
    
    The codebase all over the place seem to have quite a few `/**` - not just this PR.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    @ajs6f I haven't thought about trying to do an in-place update of a text index; however, perhaps one could use jena/textindexer with an  assembler file modified to create a Lucene index off-to-the-side and then halt the main server and swap in the new index.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Assuming the PR is accepted I'll update the jena-text doc to reflect that there is an additional output arg for `text:query`. An example is:
    
        select ?g ?s ?lit ?sc
        where {
           (?s ?sc ?lit ?g) text:query (skos:altLabel "one" 100 "lang:en") .
        }
    
    where the `?g` reports the graph in which the matching triples occur. This is likely to be rather more performant than iterating over all graphs or collecting the graph URIs after the fact.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    Ah! I should have written on this.
    
    Upgrading to this PR does not affect an existing text index.
    
    The changes to the Lucene documents that are indexed will affect triples that are added - they will have fewer fields and if the graph field is enabled then a single stored graph field will be present rather than several instances. 
    
    This PR removes redundant fields or unreferenced fields when indexing new triple documents. 
    
    The triple/document deletion functionality will behave as before if it was enabled when the text index was created.
    
    The graph return feature will function with older indexes provided that the graph field was enabled when the text index was created.
    
    Re-indexing should generally reduce the size of the text index.
    



---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by ajs6f <gi...@git.apache.org>.

Github user ajs6f commented on the issue:

    https://github.com/apache/jena/pull/335
  
    @xristy That's what I meant-- I didn't mean to suggest an in-place op, sorry.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by xristy <gi...@git.apache.org>.

Github user xristy commented on the issue:

    https://github.com/apache/jena/pull/335
  
    I've submitted an update to the jena-text documentation to reflect the graph output argument for `text:query`.


---

[GitHub] jena issue #335: Jena 1453 reduce docs

Posted by afs <gi...@git.apache.org>.

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/335
  
    A few words for the release announcement, especially about non needing to rebuild indexes but if you do, hey are smaller would be most helpful.



---