You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by osma <gi...@git.apache.org> on 2015/06/26 15:50:20 UTC

[GitHub] jena pull request: jena-text stored literals: initial functionalit...

GitHub user osma opened a pull request:

    https://github.com/apache/jena/pull/81

    jena-text stored literals: initial functionality and tests for Lucene

    This PR implements a feature where it's possible to store the original literal values in the jena-text Lucene index and to access them when querying the index. It works like this:
    
    1) Configure jena-text to store literals (default is off) using the new `text:storeValues` setting. Note that you also need the `text:langField` setting in the entity map for language tags and datatypes to be handled correctly.
    
    ```
    <#indexLucene> a text:TextIndexLucene ;
        #text:directory <file:Lucene> ;
        text:directory "mem" ;
        text:storeValues true ;
        text:entityMap <#entMap> ;
        .
    
    <#entMap> a text:EntityMap ;
        text:entityField "uri" ;
        text:langField "lang" ;
        [...]
    ```
    
    2) Add some data, say this triple:
    
    ```
    :myresource rdfs:label "My resource"@en .
    ```
    
    3) Query like this:
    
    ```
    SELECT * {
      (?s ?score ?literal) text:query "resource" .
    }
    ```
    
    In the query result, `?literal` will be bound to `"My resource"@en.`
    
    It also works with typed literals (as requested by @ehedgehog). The datatype will be stored in the langField using a special prefix (currently `^^`) which ensures that it cannot be interpreted as a language tag.
    
    There are unit tests for all the basic cases (simple, non-default property, language tags, datatypes).
    
    I had to change the TextIndex API slightly again, to pass the queried property from TextQueryPF to TextIndexLucene/TextIndexSolr so that they know which field to look up values from. Since it was already changed recently, to return TextHit objects instead of Nodes, I wouldn't expect another change to hurt.
    
    I've done a basic implementation for Solr as well which doesn't handle the language tags and datatypes (TextIndexSolr didn't have support for langField...), but it should be able to return at least the lexical value. I haven't been able to test this because of lack of documentation for the jena-text/Solr combination and possibly some bitrot in TextIndexSolr - last time I tried I couldn't get it working at all.
    
    I can do documentation for this after it has been merged. Now that I have committer access I could merge this myself, but I'd like to get a couple of +1's before doing that.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/osma/jena jena-text-literal

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #81
    
----
commit 1592c33f21e5337ecfa74706f5a675e6c57f9967
Author: Osma Suominen <os...@aalto.fi>
Date:   2015-06-26T06:53:10Z

    jena-text stored literals: initial functionality and tests for Lucene

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: jena-text stored literals: initial functionalit...

Posted by osma <gi...@git.apache.org>.
Github user osma commented on the pull request:

    https://github.com/apache/jena/pull/81#issuecomment-115768545
  
    @afs Sure, now there is one: [JENA-978](https://issues.apache.org/jira/browse/JENA-978)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] jena pull request: jena-text stored literals: initial functionalit...

Posted by Andy Seaborne <an...@apache.org>.
Is there a JIRA to tie this to?  If not, could you create one?

That way we can use JIRA to track change/additions, including all it's 
filtering capabilities.

	Andy

On 26/06/15 14:50, osma wrote:
> GitHub user osma opened a pull request:
>
>      https://github.com/apache/jena/pull/81
>
>      jena-text stored literals: initial functionality and tests for Lucene
>
>      This PR implements a feature where it's possible to store the original literal values in the jena-text Lucene index and to access them when querying the index. It works like this:
>
>      1) Configure jena-text to store literals (default is off) using the new `text:storeValues` setting. Note that you also need the `text:langField` setting in the entity map for language tags and datatypes to be handled correctly.
>
>      ```
>      <#indexLucene> a text:TextIndexLucene ;
>          #text:directory <file:Lucene> ;
>          text:directory "mem" ;
>          text:storeValues true ;
>          text:entityMap <#entMap> ;
>          .
>
>      <#entMap> a text:EntityMap ;
>          text:entityField "uri" ;
>          text:langField "lang" ;
>          [...]
>      ```
>
>      2) Add some data, say this triple:
>
>      ```
>      :myresource rdfs:label "My resource"@en .
>      ```
>
>      3) Query like this:
>
>      ```
>      SELECT * {
>        (?s ?score ?literal) text:query "resource" .
>      }
>      ```
>
>      In the query result, `?literal` will be bound to `"My resource"@en.`
>
>      It also works with typed literals (as requested by @ehedgehog). The datatype will be stored in the langField using a special prefix (currently `^^`) which ensures that it cannot be interpreted as a language tag.
>
>      There are unit tests for all the basic cases (simple, non-default property, language tags, datatypes).
>
>      I had to change the TextIndex API slightly again, to pass the queried property from TextQueryPF to TextIndexLucene/TextIndexSolr so that they know which field to look up values from. Since it was already changed recently, to return TextHit objects instead of Nodes, I wouldn't expect another change to hurt.
>
>      I've done a basic implementation for Solr as well which doesn't handle the language tags and datatypes (TextIndexSolr didn't have support for langField...), but it should be able to return at least the lexical value. I haven't been able to test this because of lack of documentation for the jena-text/Solr combination and possibly some bitrot in TextIndexSolr - last time I tried I couldn't get it working at all.
>
>      I can do documentation for this after it has been merged. Now that I have committer access I could merge this myself, but I'd like to get a couple of +1's before doing that.
>
> You can merge this pull request into a Git repository by running:
>
>      $ git pull https://github.com/osma/jena jena-text-literal
>
> Alternatively you can review and apply these changes as the patch at:
>
>      https://github.com/apache/jena/pull/81.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
>      This closes #81
>
> ----
> commit 1592c33f21e5337ecfa74706f5a675e6c57f9967
> Author: Osma Suominen <os...@aalto.fi>
> Date:   2015-06-26T06:53:10Z
>
>      jena-text stored literals: initial functionality and tests for Lucene
>
> ----
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


[GitHub] jena pull request: jena-text stored literals: initial functionalit...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---