You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Laura Morales <la...@mail.com> on 2017/04/07 09:33:04 UTC

DOAP retrieve license info

I'm experimenting with Fuseki and the DOAP files of the Apache projects.
I've run a query to return name/license/description about SpamAssassin, and this is the result

{
  "head": {
    "vars": [ "name" , "license" , "description" ]
  } ,
  "results": {
    "bindings": [
      {
        "name": { "type": "literal" , "xml:lang": "en" , "value": "Apache SpamAssassin" } ,
        "license": { "type": "uri" , "value": "http://usefulinc.com/doap/licenses/asl20" } ,
        "description": { "type": "literal" , "xml:lang": "en" , "value": "Apache SpamAssassin is an extensible email filter which is used to identify spam. Using its rule base, it uses a wide range of advanced heuristic and statistical analysis tests on mail headers and body text to identify \"spam\", also known as unsolicited bulk email. Once identified, the mail can then be optionally tagged as spam for later filtering. It provides a command line tool to perform filtering, a client-server system to filter large volumes of mail, and Mail::SpamAssassin, a set of Perl modules." }
      }
    ]
  }
}

and this is the corresponding DOAP file https://spamassassin.apache.org/doap.rdf

I'm a bit confused about the meaning of the URI <http://usefulinc.com/doap/licenses/asl20>. Does this mean that there used to be the content of the asl20 license on that URL, and now the link is broken since there is nothing there? Or does it represent the "subject" of some other resource in some other graph where I can find more information about that license (in which case, where do I find said graph)?

Re: DOAP retrieve license info

Posted by Laura Morales <la...@mail.com>.
Thank you for the clear explanation.


======================================

It's hard to tell what you are doing without seeing your query. When you ask a question about SPARQL, it's a good idea to always include the query, some sample data and how you executed the query, even if it seems obvious.

Speculating about what you did, and assuming that you built your "license" column from the DOAP "license" property, then we examine that _predicate_ to find the meaning of the URI. In this case we get lucky because DOAP is a well-known vocabulary using an http:// namespace. (Once we expand any prefixing) doap:license becomes http://usefulinc.com/ns/doap#license. Retrieving that URI we get a schema describing the DOAP vocabulary. That schema is published in RDF/XML, but if we translate it to NTriples (using a tool like Jena's riot) we see a triple:

<http://usefulinc.com/ns/doap#license[http://usefulinc.com/ns/doap#license]> <http://www.w3.org/2000/01/rdf-schema#comment[http://www.w3.org/2000/01/rdf-schema#comment]> "The URI of an RDF description of the license the software is distributed under."@en .

So we've found what we can consider the meaning of the URI on the other end of that predicate. The fact that the link is broken is unfortunate, but not a new problem on the Web. Bringing semantics to the Web hasn't gotten and won't get rid of the classic problem of link rot.

As to whether it is used in some other graph somewhere as a subject, I would be surprised if it is not. But that's not the thing that matters for determining the meaning of its appearance in a graph that you are working with. That meaning comes entirely from the predicate with which it appears, as subject or object. That's how RDF works-- meaning is built _up_ out of triples, not _down_ from larger contexts, and the relationship that is being proposed in a triple is defined by the predicate used. To find the meaning of a subject or object, find the meaning of the predicate with which it is used.

In the case of a predicate in an HTTP namespace, you can start by simply dereferencing its URI, with a browser or other tool. You might find human-centered documentation or more RDF where the predicate's URI features as a subject in a triple that gives it a meaning, as we found above. In the case of other vocabularies, you will have to find documentation/semantics in some other way that will depend on the protocol and form of the URI in use. But HTTP URIs are by a long ways the most common. This "find an URI in a graph, follow it, find another graph with more information" is the essential mechanism of linked data done with RDF.

Re: DOAP retrieve license info

Posted by "A. Soroka" <aj...@virginia.edu>.
It's hard to tell what you are doing without seeing your query. When you ask a question about SPARQL, it's a good idea to always include the query, some sample data and how you executed the query, even if it seems obvious.

Speculating about what you did, and assuming that you built your "license" column from the DOAP "license" property, then we examine that _predicate_ to find the meaning of the URI. In this case we get lucky because DOAP is a well-known vocabulary using an http:// namespace. (Once we expand any prefixing) doap:license becomes http://usefulinc.com/ns/doap#license. Retrieving that URI we get a schema describing the DOAP vocabulary. That schema is published in RDF/XML, but if we translate it to NTriples (using a tool like Jena's riot) we see a triple:

<http://usefulinc.com/ns/doap#license> <http://www.w3.org/2000/01/rdf-schema#comment> "The URI of an RDF description of the license the software is distributed under."@en .

So we've found what we can consider the meaning of the URI on the other end of that predicate. The fact that the link is broken is unfortunate, but not a new problem on the Web. Bringing semantics to the Web hasn't gotten and won't get rid of the classic problem of link rot.

As to whether it is used in some other graph somewhere as a subject, I would be surprised if it is not. But that's not the thing that matters for determining the meaning of its appearance in a graph that you are working with. That meaning comes entirely from the predicate with which it appears, as subject or object. That's how RDF works-- meaning is built _up_ out of triples, not _down_ from larger contexts, and the relationship that is being proposed in a triple is defined by the predicate used. To find the meaning of a subject or object, find the meaning of the predicate with which it is used.

In the case of a predicate in an HTTP namespace, you can start by simply dereferencing its URI, with a browser or other tool. You might find human-centered documentation or more RDF where the predicate's URI features as a subject in a triple that gives it a meaning, as we found above. In the case of other vocabularies, you will have to find documentation/semantics in some other way that will depend on the protocol and form of the URI in use. But HTTP URIs are by a long ways the most common. This "find an URI in a graph, follow it, find another graph with more information" is the essential mechanism of linked data done with RDF.

---
A. Soroka
The University of Virginia Library

> On Apr 7, 2017, at 5:33 AM, Laura Morales <la...@mail.com> wrote:
> 
> I'm experimenting with Fuseki and the DOAP files of the Apache projects.
> I've run a query to return name/license/description about SpamAssassin, and this is the result
> 
> {
>  "head": {
>    "vars": [ "name" , "license" , "description" ]
>  } ,
>  "results": {
>    "bindings": [
>      {
>        "name": { "type": "literal" , "xml:lang": "en" , "value": "Apache SpamAssassin" } ,
>        "license": { "type": "uri" , "value": "http://usefulinc.com/doap/licenses/asl20" } ,
>        "description": { "type": "literal" , "xml:lang": "en" , "value": "Apache SpamAssassin is an extensible email filter which is used to identify spam. Using its rule base, it uses a wide range of advanced heuristic and statistical analysis tests on mail headers and body text to identify \"spam\", also known as unsolicited bulk email. Once identified, the mail can then be optionally tagged as spam for later filtering. It provides a command line tool to perform filtering, a client-server system to filter large volumes of mail, and Mail::SpamAssassin, a set of Perl modules." }
>      }
>    ]
>  }
> }
> 
> and this is the corresponding DOAP file https://spamassassin.apache.org/doap.rdf
> 
> I'm a bit confused about the meaning of the URI <http://usefulinc.com/doap/licenses/asl20>. Does this mean that there used to be the content of the asl20 license on that URL, and now the link is broken since there is nothing there? Or does it represent the "subject" of some other resource in some other graph where I can find more information about that license (in which case, where do I find said graph)?