You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Alessandro Adamou <ad...@cs.unibo.it> on 2012/02/01 19:19:26 UTC

Discrepancies in RDF/XML vs. JSONLD signatures in EntityHub

Hi,

I tried to add a MusicBrainz Referenced Site (the one maintained by 
DBTune) using the following configuration:

id = musicbrainz
name = musicbrainz
entity prefixes =
     http://dbtune.org/musicbrainz/resource/
     http://dbtune.org/musicbrainz/resource/artist/
     http://dbtune.org/musicbrainz/resource/record/
     http://dbtune.org/musicbrainz/resource/track/
     http://dbtune.org/musicbrainz/resource/master/
access URI = http://dbtune.org/musicbrainz/data/
dereferencing = coolURI
query service = http://dbtune.org/musicbrainz/sparql
query strategy = SPARQL
caching strategy = used
cache name = mbcache

I then went on to query the site for the artist Metallica, first in json

curl -H "Accept: application/json" 
"http://localhost:8080/entityhub/site/musicbrainz/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"

then in rdf+xml

curl -H "Accept: application/rdf+xml" 
"http://localhost:8080/entityhub/site/musicbrainz/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"

I noticed that the rdf/xml graph as many, many more triples. More 
precisely, it includes inverse predicates, namely triples like

?x foaf:maker 
http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab

I haven't checked if there is anything else missing yet.

Is this some limitation of the json-ld renderer or something?

Also, with this cache strategy am I not supposed to get the same data 
when querying the entity hub tout-court? If I issue:

curl -H "Accept: application/json" 
"http://localhost:8080/entityhub/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"

I keep getting a 404

Thanks for your help

Alessandro

-- 
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"As for the charges against me, I am unconcerned. I am beyond their timid, lying morality, and so I am beyond caring."
(Col. Walter E. Kurtz)

Not sent from my iSnobTechDevice


Re: Discrepancies in RDF/XML vs. JSONLD signatures in EntityHub

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

On Thu, Feb 2, 2012 at 11:53 AM, Alessandro Adamou <ad...@cs.unibo.it> wrote:
>> The entityhub does not support incoming triples. Therefore it is expected
>> that those are missing in the Entityhub specific "application/json"
>> serialization.
>> If you choose a RDF backed serialization such triples may be present if
>> the remote service returns them. This may be the case for the "coolURI"
>> dereferencer.
>
>
> Ok but I still don't understand why it is the acceptable MIME-type that
> makes the difference: does the entityhub json writer filter incoming triples
> or something?
>

If the Dereferencer (like the CoolURI dereferencer you configured)
returns the data as RDF, than the Entityhub directly streams those
results to serializers that also use an RDF graph as source. The
"application/json" serializer directly operates on the entityhub
Representation interface.

Because of that you do not see incoming triples with
"application/json" but all data returned by the CoolURI dereferencer
with RDF based serialization.

>
>> Note that the "application/json" returned by the Entityhub is NOT json-ld
>> but an own JSON serialization.
>
>
> I see, so it's not possible to serialize to json-ld at all?
>
The current "json-ld" serializer has still issues (completeness and
performance) that would makes its usage questionable.
As soon as this issues are solved the Entityhub will use json-ld for
"application/json" requests. The current json format will than use an
mime type such as "application/entityhub+json".

> It might not be a problem for me as I am probably going to use the EntityHub
> Java API and handle Representation objects. I just use the REST API to try
> out the EntityHub itself.
>
On the JavaAPI level you can make a check like

if(representation instance of RdfRepresentation){
    ((RdfRepresentation)representation).getGraph();
}

this would give you the graph as serialized as RDF.


If you really need to check incoming Triples, than you can use LDPath for that

e.g. go to http://dev.iks-project.eu:8081/entityhub/site/dbpedia/ldpath

and use:

Context: http://dbpedia.org/resource/Category:Host_cities_of_the_Summer_Olympic_Games

LD-Path:

schema:name = rdfs:label[@en];
members = ^dc:subject :: xsd:anyURI;

the '^{property}' allows to traverse inverse relations.

I hope this makes things more clear

best
Rupert



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Discrepancies in RDF/XML vs. JSONLD signatures in EntityHub

Posted by Alessandro Adamou <ad...@cs.unibo.it>.
Thanks Rupert! Further questions below:

> Parsing "http://dbtune.org/musicbrainz/resource/" would be enough as 
> it will be matched as "http://dbtune.org/musicbrainz/resource/*". 

Ok will do. I just wasn't sure whether it was using wildcards like that.

> The entityhub does not support incoming triples. Therefore it is expected that those are missing in the Entityhub specific "application/json" serialization.
> If you choose a RDF backed serialization such triples may be present if the remote service returns them. This may be the case for the "coolURI" dereferencer.

Ok but I still don't understand why it is the acceptable MIME-type that 
makes the difference: does the entityhub json writer filter incoming 
triples or something?

> Note that the "application/json" returned by the Entityhub is NOT json-ld but an own JSON serialization.

I see, so it's not possible to serialize to json-ld at all?

It might not be a problem for me as I am probably going to use the 
EntityHub Java API and handle Representation objects. I just use the 
REST API to try out the EntityHub itself.

>  From the cache you will get no incoming triples regardless of the chosen Accept mime type. This is simple because incoming triples will not get stored in the cache.

Ok then I will be expecting that.

> Have you configured a Yard and a Cache with the name "mbcache" as noted in the above configuration?

Ouch, no I will try that now. So I should:

* crate a Solr Yard "mbcache". Do I have to create a musicbrainz Solr 
index manually beforehand? can I use the existing Stanbol managed Solr 
server for that?
* create a Cache Configuration using "mbcache" as a Yard

best,

Alessandro

-- 
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"As for the charges against me, I am unconcerned. I am beyond their timid, lying morality, and so I am beyond caring."
(Col. Walter E. Kurtz)

Not sent from my iSnobTechDevice


Re: Discrepancies in RDF/XML vs. JSONLD signatures in EntityHub

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Alessandro
On 01.02.2012, at 19:19, Alessandro Adamou wrote:
> 
> I tried to add a MusicBrainz Referenced Site (the one maintained by DBTune) using the following configuration:
> 
> id = musicbrainz
> name = musicbrainz
> entity prefixes =
>    http://dbtune.org/musicbrainz/resource/
>    http://dbtune.org/musicbrainz/resource/artist/
>    http://dbtune.org/musicbrainz/resource/record/
>    http://dbtune.org/musicbrainz/resource/track/
>    http://dbtune.org/musicbrainz/resource/master/

Parsing "http://dbtune.org/musicbrainz/resource/" would be enough as it will be matched as "http://dbtune.org/musicbrainz/resource/*".

> access URI = http://dbtune.org/musicbrainz/data/
> dereferencing = coolURI
> query service = http://dbtune.org/musicbrainz/sparql
> query strategy = SPARQL
> caching strategy = used
> cache name = mbcache
> 
> I then went on to query the site for the artist Metallica, first in json
> 
> curl -H "Accept: application/json" "http://localhost:8080/entityhub/site/musicbrainz/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"
> 
> then in rdf+xml
> 
> curl -H "Accept: application/rdf+xml" "http://localhost:8080/entityhub/site/musicbrainz/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"
> 
> I noticed that the rdf/xml graph as many, many more triples. More precisely, it includes inverse predicates, namely triples like
> 
> ?x foaf:maker http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab
> 

The entityhub does not support incoming triples. Therefore it is expected that those are missing in the Entityhub specific "application/json" serialization.
If you choose a RDF backed serialization such triples may be present if the remote service returns them. This may be the case for the "coolURI" dereferencer.

> I haven't checked if there is anything else missing yet.

I would expect all incoming triples are "missing". If you just look at outgoing the two serialization should represent the same information.

> 
> Is this some limitation of the json-ld renderer or something?
> 

Note that the "application/json" returned by the Entityhub is NOT json-ld but an own JSON serialization. 

> Also, with this cache strategy am I not supposed to get the same data when querying the entity hub tout-court? If I issue:
> 
> curl -H "Accept: application/json" "http://localhost:8080/entityhub/entity?id=http://dbtune.org/musicbrainz/resource/artist/65f4f0c5-ef9e-490c-aee3-909e7ae6b2ab"
> 
From the cache you will get no incoming triples regardless of the chosen Accept mime type. This is simple because incoming triples will not get stored in the cache.

> I keep getting a 404

Have you configured a Yard and a Cache with the name "mbcache" as noted in the above configuration?

best
Rupert